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Preface 


Why another book on option pricing and why the choice of R language? The R 
language is increasingly accepted by so-called ‘quants’ as the basic infrastructure 
for hnancial applications. A growing number of projects, papers and conferences 
are either R-centric or, at least, deal with R solutions. R itself may not initially 
be very friendly, but, on the other hand, are stochastic integrals, martingales, 
and the Levy process that friendly? In addition to this argument, we should take 
into account the famous quote from the R community by Greg Snow which 
describes the correct approach to R but equally applies to the Ito integral and to 
mathematical bnance in general: 

When talking about user friendliness of computer software I like the 
analogy of cars vs. busses: Busses are very easy to use, you just need 
to know which bus to get on, where to get on, and where to get off 
(and you need to pay your fare). Cars on the other hand require much 
more work, you need to have some type of map or directions (even if 
the map is in your head), you need to put gas in every now and then, 
you need to know the rules of the road (have some type of driver’s 
licence). The big advantage of the car is that it can take you a bunch 
of places that the bus does not go and it is quicker for some trips 
that would require transferring between busses. R is a 4-wheel drive 
SUV (though environmentally friendly) with a bike on the back, a 
kayak on top, good walking and running shoes in the passenger seat, 
and mountain climbing and spelunking gear in the back. R can take 
you anywhere you want to go if you take time to learn how to use 
the equipment, but that is going to take longer than learning where 
the bus stops are in a point-and-click GUI. 

This book aims to present an indication of what is going on in modern bnance 
and how this can be quickly implemented in a general computational framework 
like R rather than providing extra optimized lower-level programming languages’ 
ad hoc solutions. For example, this book tries to explain how to simulate and 
calibrate models describing hnancial assets by general methods and generic func¬ 
tions rather than offering a series of highly specialized functions. Of course, the 
code in the book tries to be efficient while being generalized and some hints are 
given in the direction of further optimization when needed. 
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PREFACE 


The choice of the R language is motivated both by the fact that the author is 
one of the developers of the R Project but also because R, being open source, is 
transparent in that it is always possible to see how numerical results are obtained 
without being deterred by a ‘black-box’ of a commercial product. And the R com¬ 
munity, which is made by users and developers who in many cases correspond to 
researchers in the field, do a continuous referee process on the code. This has been 
one of the reasons why R has gained so much popularity in the last few years, 
but this is not without cost (the ‘no free lunch’ aspect of R), in particular because 
most R software is given under the disclaimer ‘use at your own risk’ and because, 
in general, there is no commercial support for R software, although one can eas¬ 
ily experience peer-to-peer support from public mailing lists. This situation is 
also changing nowadays because an increasing number of companies are selling 
services and support for R-related software, in particular in finance and genetics. 

When passing from the theory of mathematical finance to applied finance, 
many details should be taken into account such as handling the dates and times, 
the source of the time series in use, the time spent in running a simulation etc. 
This books tries to keep this level rather than a very abstract formulation of the 
problems and solutions, while still trying to present the mathematical models in 
a proper form to stimulate further reading and study. 

The mathematics in this book is necessarily kept to a minimum for reasons 
of space and to keep the focus on the description and implementation of a wider 
class of models and estimation techniques. Indeed, while it is true that most 
mathematical papers contain a section on numerical results and empirical analysis, 
very few textbooks discuss these topics for models outside the standard Black 
and Scholes world. 

The first chapters of the book provide a more in-depth treatment with exercises 
and examples from basic probability theory and statistics, because they rely on 
the basic instruments of calculus an average student (e.g. in economics) should 
know. They also contain several results without proof (such as inequalities), 
which will be used to sketch the proofs of the more advanced parts of the 
book. The second part of the book only touches the surface of mathematical 
abstraction and provides sketches of the proofs when the mathematical details 
are too technical, but still tries to give the correct indication of why the level of 
mathematical abstraction is needed. So the first part can be used by students in 
finance as a summary and the second part as the main section of the book. It 
is assumed that readers are familiar with R. but a summary of what they need 
to know to understand this book is contained in the two appendices as well as 
some general description of what is available and up-to-date in R in the context 
of finance. 

So, back to Snow’s quote: this book is more a car than a bus, but maybe with 
automatic gears and a solar-power engine, rather than a sport car with completely 
manual gears that requires continuous refueling and tuning. 


PREFACE 


xv 


A big and long-lasting smile is dedicated to my beloved Ilia, Ludovico 
and Lucia, for the time I spent away from them during the preparation of this 
manuscript. As V. Borges said once, ‘a smile is the shortest distance between 
two persons’. 

S.M. Iacus 
November 2010 
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A synthetic view 


Mathematical finance has been an exponentially growing field of research in the 
last decades and is still impressively active. There are also many directions and 
subfields under the hat of ‘finance’ and researchers from very different fields, 
such as economics (of course), engineering, mathematics, numerical analysis and 
recently statistics, have been involved in this area. 

This chapter is intended to give a guidance on the reading of the book and 
to provide a better focus on the topics discussed herein. The book is intended to 
be self-contained in its exposition, introducing all the concepts, including very 
preliminary ones, which are required to better understand more complex topics 
and to appreciate the details and the beauty of some of the results. 

This book is also very computer-oriented and it often moves from theory to 
applications and examples. The R statistical environment has been chosen as a 
basis. All the code presented in this book is free and available as an R statistical 
package called opefimor on cran. 1 

There are many good publications on mathematical finance on the market. 
Some of them consider only mathematical aspects of the matter at different level 
of complexity. Other books that mix theoretical results and software applications 
are usually based on copyright protected software. These publications touch upon 
the problem of model calibration only incidentally and in most cases the focus is 
on discrete time models mainly (ARCH, GARCH, etc.) with notable exceptions. 

The main topics of this book are the description of models for asset dynam¬ 
ics and interest rates along with their statistical calibration. In particular, the 
attention is on continuous time models observed at discrete times and calibration 
techniques for them in terms of statistical estimation. Then pricing of deriva¬ 
tive contracts on a single underlining asset in the Black and Scholes-Merton 
framework (Black and Scholes 1973; Merton 1973), pricing of basket options, 
volatility, covariation and regime switching analysis are considered. At the same 

1 CRAN, Comprehensive R Archive Network, http:://cran.r-project.org. 
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time, the book considers jump diffusions and telegraph process models and 
pricing under these dynamics. 

1.1 The world of derivatives 

There are many kinds of financial markets characterized by the nature of the 
financial products exchanged rather than their geographical or physical location. 
Examples of these markets are: 

• stock markets: this is the familiar notion of stock exchange markets, like 
New York, London, Tokyo, Milan, etc.; 

• bond markets: for fixed return financial products, usually issued by central 
banks, etc.; 

• currency markets or foreign exchange markets: where currencies are 
exchanged and their prices are determined; 

• commodity markets: where prices of commodities like oil, gold, etc. are 
fixed; 

• futures and options markets: for derivative products based on one or 
more other underlying products typical of the previous markets. 

The book is divided into two parts (although some natural overlapping occurs). In 
the first part the modelling and analysis of dynamics of assets prices and interest 
rates are presented (Chapters 3, 4 and 5). In the second part, the theory and prac¬ 
tice on derivatives pricing are presented (Chapters 6 and 7). Chapter 2 and part of 
Chapter 3 contain basic probabilistic and statistical infrastructure for the subse¬ 
quent chapters. Chapter 4 introduces the numerical basic tools which, usually in 
finance, complement the analytical results presented in the other parts. Chapter 8 
presents an introduction to recently introduced models which go beyond the stan¬ 
dard model of Black and Scholes and the Chapter 9 presents accessory results 
for the analysis of financial time series which are useful in risk analysis and 
portfolio choices. 

1.1.1 Different kinds of contracts 

Derivatives are simply contracts applied to financial products. The most traded 
and also the object of our interest are the options. An option is a contract that 
gives the right to sell or buy a particular financial product at a given price on 
a predetermined date. They are clearly asymmetric contracts and what is really 
sold is the ‘option’ of exercise of a given right. Other asymmetric contracts are 
so-called futures or forwards. Forwards and futures are contracts which oblige 
one to sell or buy a financial product at a given price on a certain date to another 
party. Options and futures are similar in that, e.g., prices and dates are prescribed 
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but clearly in one case what is traded is an opportunity of trade and in the other 
an obligation. We mainly focus on option pricing and we start with an example. 

1.1.2 Vanilla options 

Vanilla options 2 is a term that indicates the most common form of options. An 
option is a contract with several ingredients: 

• the holder : who subscribes the financial contract; 

• the writer: the seller of the contract; 

• the underlying asset: the financial product, usually but not necessarily a 
stock asset, on which the contract is based; 

• the expiry date: the date on which the right (to sell or buy) the underlying 
asset can be exercised by the holder; 

• the exercise or strike price: the predetermined price for the underlying asset 
at the given date. 

Hence, the holder buys a right and not an obligation (to sell or buy), con¬ 
versely the writer is obliged to honor the contract (sell or buy at a given price) 
at the expiry date. 

The right of this choice has an economical value which has to be paid in 
advance. At the same time, the writer has to be compensated from the obligation. 
Hence the problem of fixing a fair price for an option contract arises. So, option 
pricing should answer the following two questions: 

• how much should one pay for his right of choice? i.e. how to fix the price 
of an option in order to be accepted by the holder? 

• how to minimize the risk associated with the obligation of the writer? i.e. 
to which (economical) extent can the writer reasonably support the cost of 
the contract? 

Example 1.1.1 (From Wilmott et al. (1995)) Suppose that there exists an asset 
on the market which is sold at $25 and assume we want to fix the price of an 
option on this asset with an expiry date of 8 months and exercise price of buying 
this asset at $25. Assume there are only two possible scenarios: (i) in 8 months 
the price of the asset rises to $27 or (ii) in 8 months the price of the asset falls to 

2 From Free On-Line Dictionary of Computing, http://foldoc.doc.ic.ac.uk. Vanilla : f Default 
flavour of ice cream in the US) Ordinary flavour, standard. When used of food, very often does 
not mean that the food is flavoured with vanilla extract! For example, ‘vanilla wonton soup ’ means 
ordinary wonton soup, as opposed to hot-and-sour wonton soup. This word differs from the canoni¬ 
cal in that the latter means ‘default’, whereas vanilla simply means ‘ordinary’. For example, when 
hackers go to a Chinese restaurant, hot-and-sour wonton soup is the canonical wonton soup to get 
(because that is what most of them usually order) even though it isn’t the vanilla wonton soup. 
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$23. In case (i) the potential holder of the option can exercise the right, pay $25 
to the writer to get the asset, sell it on the market at $27 to get a return of $2, i.e. 

$27 - $25 = $2. 

In scenario (ii), the option will not be exercised, hence the expected return is $0. 
If both scenarios are likely to happen with the same probability of the expected 
return for the potential holder of this option will be 

- x $0+ - x $2 = $1. 

2 2 

So, if we assume no transaction costs, no interest rates, etc., the fair value of this 
option should be $1. If this is the fair price, a holder investing $1 in this contract 
could gain —$1 + $2 = $1, which means 100% of the invested money in scenario 
(i) and in scenario (ii) —$1 + $0 = —$1, i.e. 100% of total loss. Which means 
that derivatives are extremely risky financial contracts that, even in this simple 
example, may lead to 100% of gain or 100% of loss. 

Now, assume that the potential holder, instead of buying the option, just buys 
the asset. In case (i) the return from this investment would be — $25 + $27 = 
$2 which means +2/25 = 0.08 (+8%) and in scenario (ii) —$25 + $23 = —$2 
which equates to a loss of value of —2/25 = —0.08 (—8%). 

From the previous example we learn different things: 

• the value of an option reacts quickly (instantaneously) to the variation of 
the underlying asset; 

• to fix the fair price of an option we need to know the price of the underlying 
asset at the expiry date: either we have a crystal ball or a good predictive 
model. We try the second approach in Chapters 3 and 5; 

• the higher the final price of the underlying asset the larger will be the profit; 
hence the price depends on the present and future values of the asset; 

• the value of the option also depends on the strike price: the lower the strike 
price, the less the loss for the writer; 

• clearly, the expiry date of the contract is another key ingredient: the closer 
the expiry date, the less the uncertainty on future values of the asset and 
vice versa; 

• if the underlying asset has high volatility (variability) this is reflected by 
the risk (and price) of the contract, because it is less certainty about future 
values of the asset. The study of volatility and Greeks will be the subject 
of Chapters 5, 6 and 9. 

It is also worth remarking that, in pricing an option (as any other risky contract) 
there is a need to compare the potential revenue of the investment against fixed 
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return contracts, like bonds, or, at least, interest rates. We will discuss models for 
the description of interest rates in the second part of Chapter 5. To summarize, 
the value of an option is a function of roughly the following quantities: 

option value = /(current asset price, strike price, 
final asset price, expiry date, 
interest rate) 

Although we can observe the current price of the asset and predict interest 
rates, and we can fix the strike price and the expiry date, there is still the need to 
build predictive models for the final price of the asset. In particular, we will not be 
faced with two simple scenarios as in the previous example, but with a complete 
range of values with some variability which is different from asset to asset. So 
not only do we need good predictive models but also some statistical assessment 
and calibration of the proposed models. In particular we will be interested in 
calculating the expected value of / mainly as a function of the final value of the 
asset price, i.e. 

payoff =£{/(■ ■ ■)) 

this is the payoff of the contract which will be used to determine the fair value 
of an option. This payoff is rarely available in closed analytical form and hence 
simulation and Monte Carlo techniques are needed to estimate or approximate it. 
The bases of this numerical approach are set in Chapter 4. 

The option presented in Example 1.1.1 was in fact a call option, where call 
means the ‘right to buy’. An option that gives a right to sell at some price is 
called a put option. In a put option, the writer is obliged to buy from the holder 
an asset to some given price (clearly, when the underlying asset has a lower 
value on the market). We will see that the structure of the payoff of a put option 
is very similar to that of a call, although specular considerations on its value 
are appropriate, e.g. while the holder of a call option hopes for the rise of the 
price of the assets, the owner of the put hopes for the decrease of this price, etc. 
Table 1.1 reports put and call prices for the Roll Royce asset. When the strike 
price is 130, the cost of a call is higher than the cost of the put. This is because 
the current price is 134 and even a small increase in the value produces a gain 
of at least $4. In the case of the put, the price should fall more than $4 in order 


Table 1.1 Financial Times , 4 Feb. 1993. (134): asset price at closing on 
3 Feb. 1993. Mar., June, Sep.: expiry date, third Wednesday of each month. 





Calls 



Puts 


Option 

Ex. Price 

Mar 

Jun 

Sep 

Mar 

Jun 

Sep 

R.Royce 

130 

11 

15 

19 

9 

14 

17 

(134) 

140 

6 

11 

16 

16 

20 

23 
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to exercise the option. Of course all the prices are functions of the expiry dates. 
This is a similar situation but with smaller prices for options with a higher strike 
price (140). 

1.1.3 Why options? 

Usually options are not primary financial contracts in one’s portfolio, but they 
are often used along with assets on which the derivative is based. A traditional 
investor may decide to buy stocks of a company if he or she believes that the 
company will increase its value. If right, the investor can sell at a proper time 
and obtain some gain, if wrong the investor can sell the shares before the price 
falls too much. If instead of real stocks the investor buys options on that stock, 
her fall or gain can go up to 100% of the investment as shown in the trivial 
example. But if one is risk adverse and wants to add a small risk to the portfolio, 
a good way to do this is to buy regular stocks and some options on the same 
stock. Also, in a long-term strategy, if one owns shares and options of the same 
asset and some temporary decrease of value occurs, one can decide to use or buy 
options to compensate this temporary loss of value instead of selling the stocks. 
For one reason or another, options are more liquid than the underlying assets, 
i.e. there are more options on an asset than available shares of that asset. 

So options imply high risk for the holder which, in turn, implies complete 
loss of investment up to doubling. Symmetrically, the writer exposes himself to 
this obligation for a small initial payment of the contract (see e.g. Table 1.1). So, 
who on earth may decide to be a writer of one of these contracts of small revenue 
and high risk? Because an option exists on the market, their price should be fixed 
in a way that is considered convenient (or fair) for both the holder and the writer. 
Surely, if writers have more information on the market than a casual holder, then 
transition costs and other technical aspects may give enough profit to afford the 
role of writers. The strategy that allows the writer to cover the risk of selling an 
option to a holder at a given price is called hedging. More precisely, the hedging 
strategy is part of the way option pricing is realized (along with the notion of 
non-arbitrage which will be discussed in details in Chapter 6). Suppose we have 
an asset with decreasing value. If a portfolio contains only assets of this type, its 
value will decrease accordingly. If the portfolio contains only put options on that 
asset, the value of the portfolio will increase. A portfolio which includes both 
assets and put options in appropriate proportion may reduce the risk to the extent 
of eliminating the risk (riskfree strategy). Hedging is a portfolio strategy which 
balances options and assets in order to reduce the risk. If the writer is able to 
sell an option at some price slightly higher than its real value, he may construct 
a hedging strategy which covers the risk of selling the option and eventually 
gain some money, i.e. obtain a risk-free profit. Risk-free strategies (as defined in 
Chapter 6) are excluded in the theory of Black and Scholes. 
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1.1.4 A variety of options 

Options like the ones introduced so far are called European options. The name 
European has nothing to do with the market on which they are exchanged but on 
the typology of the contract itself. European options are contracts for which the 
right to sell (European call option) or buy (European put option) can be exercised 
only at a fixed expiry date. These options are the object of Chapter 6. 

Options which can be exercised during the whole period of existence of the 
contract are called American options. Surely, the pricing of American options 
is more complicated than the pricing of European options because instead of a 
single fixed horizon, the complete evolution of the underlying asset has to be 
predicted in the most accurate way. In particular, the main point in possessing an 
American option is to find the optimal time on which exercise the option. This 
is the object of Chapter 7. 

In both cases, options have not only an initial value (the initial fair price) 
but their value changes with time and options can be exchanged on the market 
before expiry date. So, the knowledge of the price of an option over the whole 
life of the contract is interesting in both situations. 

Another classification of options is based on the way the payoff is determined. 
Even in the case of European options, it might happen that the final payoff of the 
option is determined not only by the last value of the underlying asset but also on 
the complete evolution of the price of the same asset, for example, via some kind 
of averaging. These are called exotic options (or path-dependent options). This is 
typical of options based on underlying products like commodities, where usually 
the payoff depends on the distance between the strike price and the average price 
during the whole life of the contract, the maximal or minimal value, etc.) or 
interest rates, where some geometric average is considered. 

Average is a concept that applies to discrete values as well as to continuous 
values (think about the expected value of random variables). Observations always 
come in discrete form as a sequence of numbers, but analytical calculations are 
made on continuous time processes. The errors due to discretization of continuous 
time models affect both calibration and estimation of the payoffs. We will discuss 
this issue throughout the text. 

Path-dependent options can be of European or American type and can be 
further subclassified according to the following categories which actually reflect 
analytical ways to treat them: 

• barrier options: exercised only if the underlying asset reaches (or not) a 
prescribed value during the whole period (for example, in a simple Euro¬ 
pean option with a given strike price, the option may be exercised by 
the holder only if the asset does not grow too much in order to contain 
the risk); 
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• Asian options: the final payoff is a function of some average of the price 
of the underlying asset; 

• lookback options: the payoff depends on the maximal or minimal value of 
the asset during the whole period. 

Notice that in this brief outlook on option pricing we mention only options on 
a single asset. Although very pedagogical to explain basic concepts of option pric¬ 
ing, many real options are based on more than one underlying asset. We will refer 
to these options as basket options and consider them in Chapter 6, Section 6.7. 
As for any portfolio strategy, correlation of financial products is something to 
take into account and not just the volatility of each single asset included in the 
portfolio. We will discuss the monitoring of volatility and covariance estimation 
of multidimensional financial time series in Chapter 9. 


1.1.5 How to model asset prices 

Modem mathematical finance originated from the doctoral thesis of Bachelier 
(1900) but was formally proposed in a complete financial perspective by Black 
and Scholes (1973) and Merton (1973). The basic model to describe asset prices 
is the geometric Brownian motion. Let us denote by {.S'(7), t > 0} the price of 
an asset at time t, for / > 0. Consider the small time interval d t and the vari¬ 
ation of the asset price in the interval [t, / + d t) which we denote by dS(t) — 
S(t + d t) — S(t). The return for this asset is the ratio between d.S'(/) and S(i). 
We can model the returns as the result of two components: 


dS(t) 


— deterministic contribution + stochastic contribution 


the deterministic contribution is related to interest rates or bonds and is a risk 
free trend of this model (usually called the drift). If we assume a constant return 
/x, after df times, the deterministic contribution to the returns of S will be /id t: 


deterministic contribution = /idt. 


The stochastic contribution is instead related to exogenous and nonpredictable 
shocks on the assets or on the whole market. For simplicity, these shocks are 
assumed to be symmetric, zero mean etc., i.e. typical Gaussian shocks. To 
separate the contribution of the natural volatility of the asset from the stochastic 
shocks, we assume further that the stochastic part is the product of a > 0 (the 
volatility) and the variation of stochastic Gaussian noise d W(t): 

stochastic contribution = ndWit). 
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It is further assumed that the stochastic variation d W(t) has a variance 
proportional to the time increment, i.e. d WO) ~ N(0,dt). The process WO), 
which is such that d W(t) — W(t + dt) — W(t) ~ jV( 0, dr), is called the Wiener 
process or Brownian motion. Putting all together, we obtain the following 
equation: 



which we can rewrite in differential form as follows: 


d S{t) = [iS(t)dt + o-SCOdWO). 


( 1 . 1 ) 


This is a difference equation, i.e. S(t + df) — S(t) = (iS(t)dt + aS{t)(W(t + dr) 
— WO)) and if we take the limit as dr —> 0, the above is a formal writing of 
what is called a stochastic differential equation , which is intuitively very simple 
but mathematically not well defined as is. Indeed, taking the limit as dr -> 0 
we obtain the following differential equation: 


S\t) = nS(t)+aS(t)W'(t) 


but the W'(r), the derivative of the Wiener process with respect to time, is not 
well defined in the mathematical sense. But if we rewrite (1.1) in integral form 
as follows: 



it is well defined. The last integral is called stochastic integral or ltd integral 
and will be defined in Chapter 3. The geometric Brownian motion is the process 
S(t) which solves the stochastic differential equation (1.1) and is at the basis of 
the Black and Scholes and Merton theory of option pricing. Chapters 2 and 3 
contain the necessary building blocks to understand the rest of the book. 

1.1.6 One step beyond 

Unfortunately, the statistical analysis of financial time series, as described by 
the geometric Brownian motion, is not always satisfactory in that financial data 
do not fit very well the hypotheses of this theory (for example the Gaussian 
assumption on the returns). In Chapter 8 we present other recently introduced 
models which account for several discrepancies noticed on the analysis of real 
data and theoretical results where the stochastic noise W (?) is replaced by other 
stochastic processes. Chapter 9 treats some advanced applied topics like monitor¬ 
ing of the volatility, estimation of covariation for asynchronous time series, model 
selection for sparse diffusion models and explorative data analysis of financial 
time series using cluster analysis. 
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1.2 Bibliographical notes 

The principal text on mathematical hnance is surely Hull (2000). This is so far the 
most complete overview of modem hnance. Other text may be good companion 
to enter the arguments because they use different level of formalism. Just to men¬ 
tion a few, we can signal the two books Wilmott (2001) and Wilmott et al. (1995). 
The more mathematically oriented reader may prefer books like Shreve (2004a,b), 
Dineen (2005), Williams (2006), Mikosch (1998), Cerny (2004), Grossinho et al. 
(2006), Korn and Kom (2001) and Musiela and Rutkowski (2005). More numer¬ 
ically oriented publications are Fries (2007), Jackel (2002), Glasserman (2004), 
Benth (2004), Ross (2003), Rachev (2004) and Scherer and Martin (2005). Other 
books more oriented to the statistical analysis of financial times series are Tsay 
(2005), Carmona (2004), Ruppert (2006) and Franke et al. (2004). 
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2 


Probability, random variables 
and statistics 


As seen, the modeling of randomness is one of the building blocks of mathemati¬ 
cal finance. We recall here the basic notions on probability and random variables 
limiting the exposition to what will be really used in the rest of the book. The 
expert reader may skip this chapter and use it only as a reference and to align 
his own notation to the one used in this book. 

Later on we will also discuss the problem of good calibration strategies of 
financial models, hence in this chapter we also recall some elementary concepts 
of statistics. 


2.1 Probability 

Random variables are functions of random elements with the property of being 
measurable. To make this subtle statement more precise we need the following 
preliminary definitions. 

Definition 2.1.1 Let £2 be some set. A family A of subsets of LI is called a -algebra 
on £2 if it satisfies the following properties: 

(i) 06 A; 

(ii) if A e A then its complementary set A is in A; 

(iii) countable unions of elements of A belong to A, i.e. let A n e A, 
n — 1,2,... then U„ A n e A. 

For example, the family {0, £2) is a er-algebra and it is called trivial a -algebra. Let 
S be some set. We denote by rr(.S') or a-[A : A C .S’) the er-algebra generated by 
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the subsets A of S, i.e. the family of sets which includes the empty set, each single 
subset of S, their complementary sets and all their possible countable unions. 

Definition 2.1.2 Let A be the cr-algebra on £2. A set function P is a probability 
measure on A if it is a function P : A h>- [0, 1] which satisfies the three axioms 
of probability due to Kolmogorov (1933): 


(i) VAC A, P(A) > 0; 


(ii) P(£2) = 1; 


(iii) let A\, Aj,, a sequence of elements of A such that Aj Cl Aj — 0 for 
i A L then 



The last formula in axiom (iii) is called complete additivity. 

Definition 2.1.3 A probability space is a triplet (Li, A, P), where Li is a generic 
set of elements (which can be thought as the collection of all possible outcomes 
of a random experiment) sometimes called ‘sample space’, A is the o-algebra 
generated by £2, i.e. the set of all sets (the events) for which it is possible to 
calculate a probability, and P is a probability measure. 


Example 2.1.4 Consider the experiment which consists of throwing a dice. The 
possible outcomes of the experiment or the sample space is £2 = {1, 2, 3, 4, 5, 6} 
and A is the cr-algebra generated by the elementary events {1}, ..., {6} ofL 2 , i.e. 
the o-algebra of all events for which it is possible to evaluate some probability. 
Consider the events E = ‘even number’ — {2, 4, 6} and F — ‘number greater 
than 4’ — {5, 6}. Both events can be obtained as union of elementary events of LI 
and both belong to A. Does the event G — ‘number 7’ belong to A? Of course 
not, indeed, it is not possible to obtain {7} by unions, complementation, etc., 
of elements of £2. In the previous example, if the dice is fair, i.e. each face is 
equiprobable, then P can be constructed as follows: 

^ #elementary elements in A | A | 

#elementary elements in Li |£2| 

therefore 


P(£) = 


P(F) — 


#{2, 4, 6) _ 3 

#{1,2,3,4,5,6} “ 6 
#{5,6} 2 

#{1,2,3,4,5,61 “ 6 


1 

2 

1 

3 


where #A stands for: ‘number of elements in set A’. 
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The following properties follow as easy consequences from the axioms of 
probability and are left as exercise. 

Exercise 2.1 Let A e A. Prove that P(A ) = 1 — P(A) and P{A) < 1. 

Theorem 2.1.5 If A and B are two elements of A, then 

P(A U B) — P(A) + P(B) — P(A Cl B) (2.1) 

For any collection of sets A\, Ao, ... in A, Equation (2.1) generalizes to 

p ( 

This last property is called sub-additivity. If A C B then 

P{A) < P(B). (2.2) 

Exercise 2.2 Prove (2.1) and (2.2). 

Using set theory one can prove the following relationships among complementary 
sets called De Morgan’s laws 

A U B — An B and AHB = A U B 

and the distributive property of intersection and union operators 

A n (B u C) = (A n B) u (A n C). 


2.1.1 Conditional probability 


Definition 2.1.6 The conditional probability of A given B is defined as 


P(A\B) = 


P(ADB) 

P(B) 


for sets B such that P( B) > 0. 


The conditional probability seen as P(-\B ) is a true probability measure, 
which can be eventually written as Pb(-) = P(-\B). Conditioning only affects 
the probability of events but not events themselves, which means that the 
expectation on the realization of some event A may change due to the knowledge 
about another event B. 


Exercise 2.3 Prove that Pb(-) — P(-\B) is a probability measure. 

Conditional probability is a key aspect of the calculus of probability from which 
other notions derive like the next one. 
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Definition 2.1.7 Two events A. B e A are said to be independent if and only if 

P(A 0 B) = P(A) ■ P(B) 

or, alternatively, if and only if P(A\B) — P(A) and P(B\A) — P(B). 

If A is independent of B also B is independent of A. In fact, given that 
A is independent of B we have P(A\B) = P(An B)/P(B ) = P{A), hence 
P(A n B) = P{A\B)P{B) = P(A)P(B). Then, P{B\A) = P(A n B)/P(A ) = 
P(A)P(B)/P(A ) = P(B). 

Definition 2.1.8 The events A i, Aj, ..., are said to be independent if and only 
if for all collection of indexes j\ yA ;2 / • ■ ■ / jk and any k > 1 we have 

( ik \ ik 

n= n 

i=jl ) i=h 

So, from previous definition, events are independent if they are mutually 
independent in couples, triplets, etc. 


2.2 Bayes’ rule 


Definition 2.2.1 A family {Aj, i = 1, ..., n] of subsets of Cl is called partition of 
Cl if 

n 

|^J Aj — £2 and At fl Aj — 0 V i j — l,... ,n. 
i =1 

The following result is easy to prove. 

Exercise 2.4 Let [Aj, i = 1 ,... ,n] be a partition of Cl and E another event in L>. 
Prove that 

n 

P(E) = J2 p (E\A i )P(A i ). (2.3) 

i =1 

Result (2.3) is sometimes called the law of total probability and it is a very 
useful tool to decompose the probability of some event into the sum of weighted 
conditional probabilities which are often easier to obtain. The next result comes 
directly as an application of the previous formula and is called Bayes’ rule. 


Theorem 2.2.2 (Bayes’ rule) Let {Aj, i = \, ... ,n} be a partition of L! and 
E C Cl, then 


P{Aj\E) 


P(E\Aj)P(Aj ) 
t P(E\Aj)P(Aj) 

i =1 


P(E\Aj)P(Aj) 




(2.4) 


P(E) 
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The terms P(Aj ) are sometimes called the prior probabilities and the terms 
P(Aj\E) the posterior probabilities, where prior and posterior are with reference 
to the occurrence of (or knowledge about) the event E. This means that the 
knowledge about event E changes the initial belief on the realization of event 
Aj, i.e. P(Aj), into P(Aj\E). The proof of (2.4) is easy if one notices that 
P(Aj\E) = P(Aj 0 E)/P(E) = P(E\Aj)P(Aj)/P(E). 

Example 2.2.3 (The Monty Hall problem) A hypothetical man dies in a 
hypothetical probabilistic world. After his death he faces three doors: red, green 
and blue. A desk clerk explains that two doors will bring him to hell, one to 
paradise. The man chooses the red door. Before he opens the red door, the desk 
clerk (who knows about the door to paradise) opens the blue door showing that 
it is the door to hell. Then he asks the man if he wants to keep the red door or 
change it with the green one. Will you change the door? 

Apparently, given that two doors are left, each door has a 50:50 percent prob¬ 
ability of leading to paradise, so there is no point in changing the doors. But 
this is not the case. Denote by Dr, D g and Dr the events ‘paradise is behind 
red/green/blue door’ respectively and denote by B the event ‘the clerk desk opens 
the blue door’. The three events Dr, Do and Dr constitute a partition of£l and, 
before event B occurs, P(Dr) — P(D G ) = P(Dr) — 1 /3 because there is no 
initial clue which one is the door to paradise. Clearly P(Dr\B) — 0. We now 
calculate P(Dr\B) and P(Dq\B). We first need P(B). If paradise is behind red 
door, the clerk chooses to show either the blue or the green door, so in particular 
B with 50% probability, hence P(B\Dr) — 1/2. If paradise is behind the green 
door, the clerk is forced to open the blue door, hence P(B\Dq) — 1. Clearly, 
P(B\Dr) = 0. By (2.3) we have 

P(B) = P(B\Dr)P(D r ) + P(B\D G )P(D G ) + P(B\D b )P(Dr) 

11 1 _ 1 
- 2 ' 3 + ' 3 ~~ 2 

Therefore, by Bayes ’ rule 


1J_2 

3 3 

P(D b \B)=0 

So the conclusion is that the man should change the red door with the green one 
because P(D g \B) > P(D R \B). 

Definition 2.2.4 Two probability measures P and Q on (LL y4) are said to be 
equivalent if, for each subset, A C A, P(A) > 0 implies Q(A) > 0 and vice versa. 


PWc\B) = = 
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The above definition means that the two measures P and Q assign positive 
probability to the same events A (though not necessarily the same numerical 
values of the probability). 


2.3 Random variables 

We denote by S(R) the Borel a -algebra generated by the open sets of R of 
the form (— 00 , x), ret, i.e. ,6(R) = er-{(— 00 , x), x e R}. Consider a function 
/ : A —>■ B and take S C B, then the inverse image of S by / is denoted by 
/ _1 (S) and corresponds to the following subset of A: 

f~ l (B) — {a e A \ f(a) e B}. 

Definition 2.3.1 Given a probability space (LL A. P), a real random variable X 
is a measurable function from (12, A) to (R, B( R)). 

Hence a random variable transforms the events belonging to Q into real numbers 
with the additional requirement of measurability. Measurability of X means the 
following: let A be a given subset of B(M), then to the event X e A it has 
to correspond a subset B e A so that P(B) is well defined. So it means that, 
whenever we want to calculate the probability of ' X e A’, we can calculate it 
from the corresponding P{B). More formally, 

VAeB( R), 3BeA:X~\A) = B 


and hence 

Pr(X <= A) = P({co ett-.coe X _1 (A)}) = P(B), A c R, B e A, 

where X -1 (A) is the inverse image of A by X. Note that we wrote Pr(X e A) 
and not P(X e A) because the probability measure P works on the elements of 
A but X takes values in R. More formally, a probability measure, say Q, on 
(R, B(R)) can be defined as Q(X e A) = P({co e £2 : co e Z _1 (A)}) and hence 
the probability space (12, A, P) is transformed by X into a new probability space 
(R, S(R), Q). To avoid too much abstraction, with abuse of notation, we simply 
write P(X e A) for Pr(X e A) — Q(X e A). That’s because once Q is defined, 
the original measure P can be disregarded. 

Example 2.3.2 Consider again the experiment in Example 2.1.4. Let X be the 
random variable defined as follows: X(co) — —1, if a> = 1 or 2, X(co) — 0, if cn 
— 3, 4, 5 and X(a>) = +1 otherwise. We want to calculate P(X > 0). 


P(X > 0) = P(X e {0,+1)) 

= P(o e£2:coe Z _1 ({0, +1})) 
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= P{{w e Q : X{co) = 0} U ({<u e Si : X(co) = +1}) 

= ^*({3, 4, 5} U {6}) = P({3, 4, 5, 6}) = ^ 

o 

so X _1 ({0, +1}) = B, where B must be a subset of A, the a -algebra of £2, in 
order to be able to calculate P(X > 0). 

Measurability of random variables at the moment appears as a technical require¬ 
ment but for stochastic processes it is a more delicate subject because it is strictly 
related to the notion of information, and information in finance is a key aspect 
of the analysis. We will return to this aspect later on. 

As mentioned, in practice with random variables, we never work directly 
with the probability measure P on {SI, A) but it is preferable to define their 
cumulative distribution function or simply their distribution function as follows: 

F(x) = P(X <x) = P(X e (-oo,x]), x e R. 

The cumulative distribution function is an increasing function of its argument 
and such that lim F{x) — 0 and lim F(x ) = 1. Further, for any two given 

x —>■ —oo x —>• +00 

numbers a < b e M, we have that 

P(a < X < b) = P(X <b)~ P(X <a) = F{b ) - F(a). 

When the random variable is continuous F{-) may admit the representation 1 

F(x) — f f(u)du, 

J —OO 

where /(■) is called the density function or simply density of X and has the 
following properties: f{x) > 0 and / R f(x)dx — 1. If X is a discrete random 
variable, then F{-) is written as 

F(x) = P(X<x)=Yl P ( X = x -0 = E P(*j) 

Xj <X Xj <x 

and p{- ) is called probability density function or mass function. Here and in the 
following, we denote by x, the generic value taken by the random variable X. 
Clearly, p(xj) > 0 for all i and pix-,) = 1. Notice that, while for a discrete 
random variable P(X — k) > 0, for a continuous random variable P{X — k) — 0 
always, hence the density function /(■) and probability mass function pf) have 
different interpretations. For example, while p{k) = P(X = k ) in discrete case, 
f(x) A P(X = x) = 0 but f{x)dx ~ P(X e [x, x + djc)) > 0. 


1 Not all continuous random variables admit a density but, for simplicity, we assume that it is 
always the case in this chapter. 
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Definition 2.3.3 Let X be a one-dimensional random variable with distribution 
function Ff). We call q-th quantile of X the real number x q which satisfies 

x = inf{x : F(x) > q}, q e [0, 1]. 

X 

Definition 2.3.4 Two random variables X and Y are said to be independent if 

P(X e A,Y e B) = P(X e A)P(Y e B), for all A, B e R, 

i.e. the probability that X takes particular values is not influenced by that ofY. 

The couple (X , Y) of random variables has a joint cumulative distribution func¬ 
tion which we denote by F(x, _y) = P(X < x, Y < y). If both variables are 
continuous we denote the joint density function as f X y (x, V) and 

FxY(x,y)= / / fxv{u, v)dudv. 

J —oo J —oo 

If both are discrete we denote the probability density function by pxy(xi, V/) = 
P(X = X j, Y — yj) and 

Fxy(x, y) = EE PXY (.Xi , yj). 
yjSy 

In case X and Y are independent, we have that F x y (x , y) — F x (x) Fy(y) where 
F x (-) and Fy(-) are the distribution functions of X and Y. Similarly for the 
densities, i.e. fxr(x, y) = fx(x)f Y (y) and pxvixi, y,) = px(xi)p Y (yj). In gen¬ 
eral if X — (Xi, X 2 ,X„) T is random vector 2 , we denote the distribution 
function by 


Fx(x i,x 2 ,x n ) = P{X x < x u X 2 < x 2 , ■. ■, X n < x n ) 
and, if all components of X are independent, the usual factorization holds, i.e. 


Fx(x 1,X2, ..., X„) = F Xl (xi )Fx 2 (x 2 ) ■■■ F Xn (x„) 


with obvious notation. Similarly for the joint densities and/or mass functions. 


Definition 2.3.5 Let P and Q be two equivalent probability measures on (£2, A). 
A function f such that 

Q(A) = f fdP 
J A 


is called Radon-Nikodym derivative and it is usually denoted as 



2 We denote by A J the transpose of A. 
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The density of a continuous random variable is defined exactly in this way. 
Indeed 


P x ((-oo, x]) = P(X e (- 00 , x]) = F{x) = f fix) dx, 

where dx — /.(d.r) is nothing but the Lebesgue measure of the interval dx. 
Thus, the density / is the Radon-Nikodym derivative of P x with respect to 
the Lebesgue measure X, i.e. / = dP x /dX. 

Definition 2.3.6 We define the expected value of a random variable X in the 
following integral transform 


E(X) = [ X(o))P( dm) = f xdF(x) 

J £2 J M 

where the last integral is a Riemann-Stieltjes integral. If X is continuous we have 
E(X) = f R xdF(x) = f R xf(x)dx and when X is discrete E(X) = Xip x (xi). 
The variance of X is defined as 

Var(X) = E(X - E{X}) 2 = / (X(«) - E{X}) 2 P(d&>). 

Jn 

The n-th moment of a random variable is defined as fi„ — E {X"). 

2.3.6.1 Some properties of the expected value operator 

The expected value has also several properties we mention without proof. Let X 
and Y be two random variables and c, M some real constants. Then 

(a) E(X ± Y) = E(X) ± E(T); 

(b) if X and Y are independent, then E(XT) = E(X)E(T); 

(c) if c is a constant, then E(cX) = cE(X) and E(c) = c; 

(d) if X > 0, then E(X) > 0; 

(e) if X > Y (i.e. X(w) - Y(co) > 0, V w e S2), then E(X) > E(T); 

(f) let 1 /t (a>) be the random variable which takes value 1 if we A and 0 
otherwise; then £(1^) = P(A)\ 

(g) if |X| < M (i.e. |X(&>)| <M,VweSl) then |E(X1 A )| < MP(A)\ 

(h) P{X eB) = E(ljxeBj) - f A X(co)P( dm) - E(X1 A ), with A = X~'(B). 

When a random variable X is such that E(X) < 00 we say that X is integrable. 
Clearly X is integrable if |X| is integrable. It is easy to see that Var(X) = 
E(X 2 ) — (E{X}) 2 by simple expansion of the binomial (X — E(X)) 2 . In general, 
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given a transform Y = g(X) of a random variable X, with g(-) a measurable 
function, it is possible to calculate E(y) as 

E(F) = E{g(X)} = f g(x)dF(x). 

JR 

Definition 2.3.7 The covariance between two random variables X and Y is the 
quantity 

Cov(X, Y) = E{(X - E(X))(T - E(F))} = E(XT) - E(X)E(F) 


where 

E(XF) = [ X((o)Y(co)P(dco) 

JS2 

is the mixed moment of X and Y. When X and Y are both discrete E(Xy) = JT 
J2jXiyjPxY(xi,yj) and when both are continuous E(XF) = J^f^xyfxY 
(. x , y)dxdy. 


The covariance between two random variables is the notion of joint variability 
and is a direct extension of the notion of variance in the univariate case. Indeed, 
Cov(X, X) = Var(X). For a random vector X = (X\, X 2 ,..., X n ) it is usu¬ 
ally worth introducing the variance-covariance matrix between the components 
(Xj. Xj), i, j — 1,..., n, which is the following matrix 


Var(X) = 


021 


0, 


012 

2 

2 


a 


01 n 
2 
2 n 


a. 


n 1 


0, 


n 2 


0,7 


with cr 2 = Var(X ; ), i = 1,..., n, and o-jj = Cov(X ; , Xj), i ^ j. 

Exercise 2.5 Prove that if X and Y are independent then Cov(X, Y) — 0. Provide 
a counter example to show that the contrary is not true, i.e. , Cov(X, Y) — 0 does 
not imply that X and Y are independent. 


If {Xj, i — I,...,//} is a family of random variables and a,, i — are 

some constants, then 


I n 1 n 

J2 a ‘ x ‘ = ajE(Xi). ( 2 . 5 ) 

i=l J i= 1 

Moreover, if the X, are mutually independent, then 

n 1 n 

J2 a ‘ X > = I>rVar(X,). 

1=1 J 1=1 


Var 


( 2 - 6 ) 
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If the random variables are not mutually independent the formula for the variance 
takes the form: 

I n 1 n 

£■««■*>■ = £ a, 2 Var(Z,) + 2 E ciiCijCoviXi, Xj). 

1=1 J 1=1 i,j:i<j 

Definition 2.3.8 A random variable X is L p integrable, and we write 
X e L P {Q.,A, P), or simply X e L p , if f Q \X[pS)\ p P{dco) — f R \x\ p dF(x) < oo. 
We call X e L 2 a square integrable random variable. 

2.3.1 Characteristic function 

The characteristic function of a random variable X is the following integral 
transform 

/ OO 

e i,x d F(x) 

-OO 

where i is the imaginary unit. When X has a density, the characteristic function 
becomes <p(t) — f R e ltx f (x)dx. The characteristic function has the following ele¬ 
mentary properties 

(i) ?(0) = 1; 

(ii) \<p{t)\ < 1 for all t, indeed \<p(t)\ — |E{c ,rA }| <E|c" a | <1 as 

\e ix \ < 1; 

(iii) (p(t) — <p(—t), with z is the complex conjugate of z — a + ib, i.e. I = 
a — ib ; 

(iv) the function <p(t) is a continuous function for all real numbers t. 

The characteristic function uniquely identifies the probability law of a random 
variable, so each random variable has one and only one characteristic function 
and each characteristic function corresponds to a single random variable. We will 
make use of this property in Section 2.3.7. 

Theorem 2.3.9 If{Xj,i = 1.«} is a family of mutually independent random 

variables and a^, i — I...., n, a sequence of constants, then the characteristic 
function of S n = J]" =1 a, Y, satisfies the following equality 

<ps n (t) = (px y (a\t)(p Xl (a 2 t) ■ ■ ■ <px n {a„t). (2.7) 


Proof Indeed 


(p Sn (t) =E{e 175 "} 



n 

= ]~[E {e ita ‘ Xi }, 

i=i 


where independence has been used in the last equality. 
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A corollary of the last theorem is the following: consider a sequence of indepen¬ 
dent and identically distributed (i.i.d.) random variables {X,, i — 1,...,«} and 
define S n = Y-!l= i ^ ■ Then, 


4>s n (t) — { 4>x(t )}" , 


where 4>x(f is the common characteristic function of the Xfs. When the moment 
of order n of the random variable exists, it is possible to obtain it by n-times 
differentiation of the characteristic function with respect to t as follows: 


E(X") = 


d" 

dr" 


<p(t) 


t=0 


( 2 . 8 ) 


This is easy to see: consider ^<p(f) = E (iXe" x ^, evaluated in t = 0 gives IE(X). 
Then by induction one proves (2.8). 


2.3.2 Moment generating function 

Closely related to the characteristic function is the so called moment generating 
function of a random variable X which is defined as follows: 

/ OO 

e ax dF(x), a el. 

-OO 


It easy to prove that 


d" 


E ( x ’) = 


oi=0 


from which its name. Further, under the same conditions for which (2.7) holds, 
we have that 


M Sn (a) = M Xx ( a x a)Mx 2 ( a 2 a) ■ ■ ■ M Xll (a„a). (2.9) 


2.3.3 Examples of random variables 

We will mention here a few random variables which will play a central role in 
the next chapters. 

2.3.3.1 Bernoulli random variable 

The Bernoulli random variable takes only the two values 1 and 0, respectively 
with probability p and 1 — p, i.e. P(X — 1) = p, P(X — 0) = 1 — p, with 0 < 
p < 1. It is usually interpreted as the indicator variable of some related events 
(for example, failure-functioning, on/off, etc.). 

Exercise 2.6 Prove that E(X) = p, Var(X) = p( 1 — p) and cp(t) = 1 — 
p + pe lt . 




PROBABILITY, RANDOM VARIABLES AND STATISTICS 25 

We denote the Bernoulli random variable X as X ~ Bert p). This variable is the 
building block of the following Binomial random variable. 

2.3.3.2 Binomial random variable 

Let Xj,i — 1, ..., n, be a sequence independent and identically distributed (i.i.d.) 
Bernoulli random variables with parameter p, i.e. X t ~ Ber(/;). The random 
variable which counts the number of ones (successes) in a sequence of n Bernoulli 
trials is called the Binomial random variable. More precisely 

n 

Y — ^ Xj ~ Bin(n, p) 

i =1 

where Bin (n, p) stands for Binomial law 3 with parameters n and p, which is 
the following discrete distribution 

P{Y = k) = (^jp k {\-p) n ~ k , k=0,l,-..,n. (2.10) 

Exercise 2.7 Prove that E(F) = np, Var(T) = np (1 — p) and (pit) = 
(1 -p + pe it ) n . 

R functions to obtain density, cumulative distribution function, quantiles and 
random numbers for the Binomial random variable are of the form [dpqrjbinom. 

2.3.3.3 Poisson random variable 

The Poisson random variable is the limit (as n —> oo) of the binomial experiment 
in the case of rare events, i.e. when the probability of observing one event is 
close to zero and n ■ p — X remains hxed as n increases. The Poisson law of 
parameter X, Poi(A), X > 0 is the following probability mass function 

X k e~ x 

P(X = k)= -, k = 0,1,... (2.11) 

k ! 

We denote a Poisson random X by X ~ Poi(A). 

Exercise 2.8 Prove that X has both mean and variance equal to X and charac¬ 
teristic function (p{t ) = exp {A (e lt — 1)}. 

R functions to obtain density, cumulative distribution function, quantiles and 
random numbers for the Poisson random variable are of the form [dpqr ] pois. 


3 In Equation (2.10) the term (£) is the binomial coefficient: (£) = k , f'i k y , with n\ =n ■ 
(n — 1) • (n — 2) • ■ ■ 2 ■ 1. 
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2.3.3.4 Uniform random variable 


Let [a, b ] be a finite interval, the Uniform random variable X ~ U(a, b), is a 
continuous random variable with density 


fix) = 



x e (a, b ), 
otherwise; 


and distribution function 


Fix) = 


0 , 

x—a 
b—a ’ 

1, 


x < a, 
x e [a, b], 
x > b. 


Exercise 2.9 Prove that 


E(X) 


a + b 
2 


Var(X) = 


(b - a ) 2 
12 


and 


(Pit) = 


gitb git a 

it(b — a) 


R functions to obtain density, cumulative distribution function, quantiles and 
random numbers for the Uniform random variable are of the form [dpqrjunif. 


2.3.3.5 Exponential random variable 

This variable is related to the Poisson random variable. While the Poisson random 
variable counts the number of rare events, the exponential random variable mea¬ 
sures the time between two occurrence of Poisson events, so it is a positive and 
continuous random variable. The exponential random variable X of parameter 
A > 0, i.e. X ~ Exp (A), has density 


fix) = 


he~ kx , 

1 °’ 


x > 0, 
otherwise; 


and distribution function 


Fix) = 


0 , 

l-e~ Xx , 


x < 0, 
x > 0. 


Let Y ~ Poi ('/,), then 

X°e~ x 

P(Y - 0) - ^ = 1 - PiX < 1) = P(X > 1), 

i.e. the probability of no events in the time unit is equivalent to waiting more 
than 1 unit of time for the first event to occur. In general, if x is some amount of 
time, then P(Y — 0) with Y ~ Poi (ax) is equal to P(X >x), with X ~ Exp (a). 
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Exercise 2.10 Prove that 

1 1 A 

E(X) = Var(X) = — and cp(t) = - 

A A- A — it 

An interesting property of the exponential distribution is that it is memoryless 
which means that, if X is an exponential random variable, then 

P(X>t+s\X>t) = P(X>s), Vs, t > 0, 


which is easy to prove. Indeed, 


P(X>t + s\X>t) = 


P(X>t+s) 
P(X>t) 


1 -F(t + s) e~ X(l+s) 
1 - F(s) ~ e~ lt 


= e~ Xs = P(X>s). 


The memoryless property means that given no event occurred before time t, 
the probability that we need to wait more 5 instants for the event to occur, is 
the same as waiting 5 instants from the origin. For example, if time unit is in 
seconds s and no event occurred in the first I (tv, then the probability that the 
event occurs after 30s, i.e. wait for another 20s or more, is P(X > 301Y > 10) = 
P(X > 20). 

R functions to obtain density, cumulative distribution function, quantiles and 
random numbers for the Exponential random variable are of the form [dpqr] exp. 


2.3.3.6 Gamma random variable 


The Gamma random variable, X ~ Gammafa, f), has a continuous distribu¬ 
tion with two parameters a > 0 and fi > 0. The density function of its law has 
the form: 

oa 

f(x) = ■^—x a -'e- fix 


where the function 


T(a) 

r (k) 


x > 0, 


/ 


t k l e f df, 


is called the gamma function. The T function has several properties. We just list 
them without proof 


• if k is an integer, then r(& + 1) = k\\ 



T(jc + 1) 

roo 


= x. 
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The Gamma is distribution is such that, if X ~ Gammafa, f), then 

a a 

E(X) = -, Var(X) = — 

P P 

and its characteristic function is 

<p(t)=E{e itX } = (l- l j) . 

The Gamma distribution includes several special cases, the most important one 
being the exponential random variable and the x 1 random variable (see below). 
Indeed, Gamma(l, /l) = Exp(fi) and Gamma (|, Jj) = y 2 . 

R functions to obtain density, cumulative distribution function, quantiles and 
random numbers for the Gamma random variable are of the form [dpqr] gamma. 

2.3.3.7 Gaussian random variable 

The Gaussian or Normal random variable is a continuous distribution with two 
parameters // and cr 2 which correspond respectively to its mean and variance. 
The density function of its law N(/z, cr 2 ) is 

1 u-aO 2 

f(x) = — p 2<r2 , x e ffiL 
s/2na 2 

Its characteristic function is given by 

i p{t) — E [e' rX } = exp 

Exercise 2.11 Derive by explicit calculations the mean, the variance and the 
characteristic function of the Gaussian random variable. 

The very special case of N(0, 1) is called a standard normal random variable. 
This random variable is also symmetric around its mean. 

R functions to obtain density, cumulative distribution function, quantiles and 
random numbers for the Gaussian random variable are of the form [dpqr] norm. 

2.3.3.8 Chi-square random variable 

The Chi-square distribution has one parameter n, the degrees of freedom and it is 
a non-negative random continuous random variable denoted as x„ with density 

f {x) = x>0 ’ 

2 - r (!) 

and characteristic function 



<p{t) — E {e' tX 


= G-2 it)~ 2. 
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If X ~ x„, then 

E (X)=n, Var(X) = 2n. 

The square of a Gaussian random variable N(0, 1) is xl distributed. Along with 
the standard Chi-square distribution it is possible to define the noncentral Chi- 
square random variable with density 

i n 1 

1 x+3 / X \ 4 5 /- 

/(*)= 2 e “^(j) 7 |-t(^)’ 

where n are the degrees of freedom, 5 > 0 is the noncentrality parameter and the 
random variable is denoted by x 2 (<5). The function h(x) is the modified Bessel 
function of the first kind (see Abramowitz and Stegun 1964). If X ~ / 2 (<5), then 


and 


its 

cp(t)= E{e itx } = ————f 
(1-2 if) 3 

E(X) = n + 5, Var(X) = 2(n + 28). 


R functions to obtain density, cumulative distribution function, quantiles and ran¬ 
dom numbers for the Chi-square random variable are of the form [ dpqr ] chisq. 


2.3.3.9 Student’s t random variable 


The Student’s t random variable is symmetric and continuous with zero mean 
and a parameter n, the degrees of freedom. It arises in many contexts and in 
particular in statistics. If a random variable has a Student’s t distribution with 
n > 0 degrees of freedom we write X ~ t n . The density of X is 


T(2±i) / x 2 

fix) = 1 + - 

V miT (|) \ n 


_n+l 


x e 


and its characteristic function is 


Moreover, 


(p{t) — Ee 


E(X) = 0, Var(X) 


ux K 'j(Vn\t\)iVn\t\)^ 


T(|)2! 


-l 


-Aj, n > 2, 

n—2 ’ ’ 

oo, 1 < n < 2, 
undefined otherwise. 


If Z ~ N(0, 1) and Y ~ x 2 , with Z and Y independent, then t = Z /^JY/n ~ 
The noncentral version of the Student’s / distribution exists as well. 

R functions to obtain density, cumulative distribution function, quantiles and 
random numbers for the Student’s t random variable are of the form [dpqr] t. 
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2.3.3.10 Cauchy-Lorentz random variable 

The Cauchy or Lorentz random variable Cauchy(y, 8) has a continuous distribu¬ 
tion with two parameters y and 8 and density 


fix) = 


1 Y 

7i y 2 + (x — 8) 2 ’ 


rcl, 


and cumulative distribution function 


1 

Fx(x ) = — arctan 

71 




This distribution is characterized by the fact that all its moments are infinite 
but the mode and the median are both equal to 8. Its characteristic function is 
given by 


(pit) = E{e ;r *} = exp [Sit — y\t\] . 


If X and Y are two independent standard Gaussian random variables, then the 
ratio X/Y is a Cauchy random variable with parameters (0, 1). The Student’s t 
distribution with n — 1 degrees of freedom is again the Cauchy random variable 
of parameters (0, 1). 

R functions to obtain density, cumulative distribution function, quantiles and 
random numbers for the Cauchy random variable are of the form [dpqr] cauchy. 


2.3.3.11 Beta random variable 


The Beta random variable, X ~ Beta(o;, ft), a. (1 > 0, has a continuous distribu¬ 
tion with support in [0, 1] and density 


fix) = 


T(a + ft) 0,-1 

Tia)T(fi) 


(1-x) 


p -1 


0 < x < 1. 


If X ~ Beta(o!, /J), then 


E(X) = 


a 

cl + ’ 


Var(X) = 


aft 

(a + ft) 2 (a + ft + 1)’ 


while the characteristic function is expressed using series expansion formulas. 
The name of this distribution comes from the fact that the normalizing constant 
is the so-called Beta function 


Beta(a, ft) 


W) 
T(a + ft) ' 


A special case of this distribution is the Beta(l, 1) which corresponds to the 
uniform distribution U(0, 1). This distribution has a compact support with a 
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density that can be flat, concave U-shaped or convex U-shaped, symmetric (for 
a — fi) or asymmetric, depending on the choice of the parameters a and /?. 

R functions to obtain density, cumulative distribution function, quantiles and 
random numbers for the Beta random variable are of the form [dpqr ] beta. 


2.3.3.12 The log-normal random variable 

The log-Normal random variable, X ~ log N(/r, a 2 ), has a continuous distribu¬ 
tion with two parameters // and cr 2 and density 

1 (logj--/b 2 

f(x) — — e 2^2 x > 0. 

xV2^2 

Its cumulative distribution function is given by 


F x (x) = <t> 


log X — IX 


where <$> (-) is the cumulative distribution function of the standard Gaussian 
random variable. The log-Normal distribution is sometimes called the Gabon’s 
distribution. This random variable is called log-Normal because, if X ~ N(/r, a 2 ) 
then Y — exp{X) ~ logN(/r, a 2 ). Its characteristic function <p(t) exists if 
Im(t) < 0. The moments of the log-Normal distributions are given by the 
formula 

E(X*) = e kll+ ^ 2cr2 


and, in particular, we have 


and 


E(X) = e M+s r, 


Var(Z) = [e° 2 - l) e 2 ^ 2 


ix = logE(Z) - ~ log 



Var(X) \ 
E(Z 2 ) ) ’ 


log 1 + 


Var(Y) \ 
E(Z 2 ) / ' 


The log-Normal takes a particular role in the Black and Scholes model presented 
in Chapter 6. In particular, the next result plays a role in the price formula of 
European call options (see Section 6.2.1): 


E 


/ \ f°° , i _2 / u + a 

(Xl { *>* } ) = J k xf(x)dx = e^ a 1 


a 


logfc ^ 


R functions to obtain density, cumulative distribution function, quantiles and ran¬ 
dom numbers for the log-normal random variable are of the form: [dpqr] lnorm. 
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2.3.3.13 Normal inverse Gaussian 


The Normal Inverse Gaussian distribution, or NIG(a, p, 8, fi), was introduced in 
finance by Barndorff-Nieisen (1997). Its density has the form: 

Ki LVi+m 2 ) 

f(x) — — exp {<5-/a; 2 — p 2 + P(x — /x)}- — -, ret, 

1 ' 1 

( 2 . 12 ) 


where K\ denotes the Bessel function of the third kind with index 1 (Abramowitz 
and Stegun 1964). In this distribution /r is a location parameter, a represents 
heaviness of the tails, p the asymmetry and 8 the scale parameter. The charac¬ 
teristic function of this distribution has the following simple form: 


<Px(t) — E {e“ A } 


.it/* exp {Sy/a 1 - p 2 } 
exp{ 8yja 2 - (P + it) 2 } 


If X ~ NIG(a, p, 8, /x), then 


E(X) = n + 


P8 

^a 2 — P 2 


Var(Z) = 


8 

yja 2 - p 2 



R functions to obtain density, cumulative distribution function, quantiles and 
random numbers for the normal inverse Gaussian random variable are available 
in package fBasics and are of the form [dpqr ] nig. 


2.3.3.14 Generalized hyperbolic distribution 

The Generalized Hyperbolic distribution, or GH (a, p,8, fi,X), was introduced 
in Eberlein and Prause (2002) as a generalization of the hyperbolic distribution. 
As special cases, it includes the normal inverse Gaussian law for A = 5 and the 
hyperbolic distribution for X — 1 (see, Barndorff-Nieisen (1977)). Its density has 
the form: 

f{x) = c(X, a, p, 8)(8 2 + (x - li) 2 )^^K k _i 

x (a-*/8 2 + (x — /x) 2 ^ exp [P(x — /r)}, rel, (2.13) 


with 


c(L, a, p, 8) 


(a 2 - p 2 )2 


\f7jna x 2 K) (s^/ce 2 — P 2 \ 


and K, is the Bessel function of the third kind of index X (Abramowitz and 
Stegun 1964). The parameters a, p, 8 and /1 have the same interpretation as in 
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the NIG distribution. The characteristic function has the form: 


cp x (t)=E{e itX }=e it ^ 


( a 2 - ft 2 Kl (/V^ 2 - (P + t'O 2 ) 

\a 2 - {ft + it) 2 ) K . ^ a 2 _ ^ 


The mean and the variance of X ~ GH(a, ft. 8 , /r. a) 
mulas: 


E(X) = fi + 


y *G(y) 


are given by the two for- 


and 


Var(X) = 


8 2 K x+l (y) ft 2 8 4 

Y K x {y) y 2 


i K x+2 (y) 

\ K x (y) 



with y — 8y/a 2 — ft 2 . 

R functions to obtain density, cumulative distribution function; quantiles and 
random numbers for the generalized hyperbolic random variable are available in 
package fBasics and are of the form [dpqr]gh. 


2.3.3.15 Meixner distribution 


The Meixner distribution originates from the theory of orthogonal polynomials 
and was suggested as a model for financial data returns in Grigelionis (1999) and 
Schoutens and Teugels (1998). The density of the Meixner(a, ft. <5) distribution 
is given by 


/(*) = 


{ 2 COS ( 2 ) } /ftx\ 

2 omT{ 2 d) CXP \ a ) 


r 



2 


rel, 


(2.14) 


with a > 0, ft e 
simple form: 


(— n, n) and 8 > 0. The characteristic function has the following 


<Px(t) = E{e ,tX } 



For X ~ Meixner(a, ft, 8 ) we have 


E(X) = 018 tan 



Var(Y) = 



2.3.3.16 Multivariate Gaussian random variable 

A random vector X — (X\, X x ...., Y„ ) T follows a multivariate Gaussian distri¬ 
bution with vector mean /u, = (m, fi 2 , ■ ■ ■, At„) T and variance-covariance matrix 
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£ if the joint density of X = (X \, X n ) J has the following form: 

1 

/(•M.* 2 . ■ ••>*«) =---r ex P 

( 2 tt )2 | S |2 

with x = (a' i, a' 2 , ... ,x„). In the above Z | is the determinant of the n x n posi¬ 
tive semi-definite matrix £ and £~* is the inverse matrix of £. For this random 
variable, E(Z) = p and Var(X) = £ and we write X ~ N(/r, £). Its character¬ 
istic function is given by the formula 




< Px(t ) = exp 



-r T £t 


I e E". 


The following two sets of conditions are equivalent to the above: 

(i) if every linear combination Y — ct\X\ + a 2 X 2 + a„X n of (X\, X 2 ,..., 
X n ) is Gaussian distributed, then X is a multivariate normal; 

(ii) let Z=(Zi,...,Z m ) T be a vector of independent normal random 
variables N(0, 1) and let p = (p \,..., p„) J be some vector, with 
A and n x m matrix. Then X — AZ + p is a multivariate Gaussian 
random variable. 

Consider now a two-dimensional Gaussian vector with mean p — (0, 0) T and 
variance-covariance matrix 


of o - 12 


°f 

po\o 2 

9 

— 


2 

1 — 

3 
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<7 2 

J 


where p = Cov(Ai. ATi/fVVarA] VVarAo) is the correlation coefficient between 
the two components A 1 and Z 2 . We have 


/(■M, * 2 ) 


1 

27ia\a 2 yJ\ - p 2 



1 

2(1 -P 2 ) 


/ 2pA[A2 

Vfff + o\o 2 


From the above formula is clear that when Cov(X 1 , XX) = 0 (and hence p — 0) 
the joint density f(x\,x 2 ) factorizes into the product of the two densities of 
X\ and X 2 , i.e. this condition is enough to deduce that X\ and X 2 are also 
independent. 

We have seen in Exercise 2.5 that in general null correlation does not imply 
independence and we should also remark that what we have just shown is different 
from saying that any two Gaussian random variables with null correlation are 
always independent. What the previous result shows is that, if X and Y are 
Gaussian and their joint distribution is also Gaussian, then null correlation implies 
independence. Indeed, consider the following example by Melnick and Tenenbein 
(1982): take X ~ N(0, 1) and let Y = —X if |X| < c and Y = X if |X| > c, where 
c > 0 is some constant. Looking at the definition of Y we see that if c is very 
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small, than Y is essentially equal to X so we expect high positive correlation; the 
converse is true if c is very large (negative correlation). So there will be some 
value of c such that the correlation between X and Y is exactly zero. We now 
show that Y is also Gaussian. 

P(Y <x) = P({(|X| < c) n (-x < x)} u {(|X| > c) n (x < *)}) 

= P((|X| < c) n (-x < x)) + P((|X| > c) n (X < x)) 

now, by the symmetry of X, P(—X < x) — P(X > — x) — P(X < x). Therefore 
we conclude that P(Y < x) = P(X < x). So Y and X are both Gaussian (actu¬ 
ally they have the same law), they may have null correlation but they are not 
independent. 

In general, it is useful to know that the components of a multivariate Gaussian 
random vector are mutually independent if and only if the variance-covariance 
matrix is diagonal. 

The R function mvrnorm in package MASS or the function rmvnorm in 
package mvtnorm can be used to obtain random numbers from the multi¬ 
variate normal distribution. Moreover, the package mvtnorm also implements 
[dpq] mvnorm functions to obtain cumulative distribution function, density func¬ 
tion and quantiles. 

2.3.4 Sum of random variables 

Although it is easy to derive the expected value of the sum of two random 
variables X and Y , it is less obvious to derive the distribution of Z — X + Y. But 
if X and Y are independent, then the probability measure of Z is the convolution 
of the probability measures of X and Y. Consider X and Y discrete random 
variables taking arbitrary integer values, then Z — X + Y can take also integer 
values. When X — k then Z — z if and only if Y = z — k, hence the probability 
of the event Z = z is the sum, over all possible values of k, of the probabilities 
of events (X = k) n (Y = z — k). Given that X and Y are independent, we have 


P(Z = Z ) = J2 P(X = k)P(Y = z — k). 

k 

If X and Y have density respectively f x (■) and /y(-) with support in R, then, 
by the same reasoning, the density of Z = X + Y is given by 

fzAz) = f fx(z-y)f Y (y)dy = f f Y (z - x)f x (z)dx (2.15) 
Jr Jr 

Sometimes the convolution is denoted by fz(z) — (fx * f Y )(z)- 

Example 2.3.10 (Sum of uniform random variables) Let X and Y be two inde¬ 
pendent uniform random variables U (0, 1) and consider Z — X + Y. We now 
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apply (2.15). First notice that z can take any value in [0, 2], Then 

fz(z) = f fx(z - y)fy(y)dy = [ fx(z-y)dy. 

Jr Jo 

But f x (z — y) 0 only if 0 < z — y < 1, i.e. z — 1 < y < z. We now split the 
range of z into [0, 1] and [1,2]. So, if z € [0, 1] we have 


fziz) = 




z. 


0 < z < 1 


and, if l < z < 2 

ldy — f dy = 2 — z, 1 < z < 2, 
1 ' Jz -1 

and fz(z) — 0 if z < 0 and z>1. Then 



fziz) = 



0 < z < 1, 
1 < z < 2, 
otherwise. 


Exercise 2.12 (Sum of exponential random variables) Let X ~ Exp(A) and 
Y ~ Exp(L), A > 0, two independent random variables. Find the density of 
Z — X + Y. 


Exercise 2.13 (Sum of Gaussian random variables) Let X ~ N(/zi, ay) and 
Y ~ N(/i 2 , fTj) two independent random variables. Prove that Z — X + y ~ 

N(Ml + /U 2 , + O';, 2 ). 

Theorem 2.3.11 Let Xy.Xi. ... ,X U be n independent Gaussian random vari¬ 
ables respectively with means /x, and variances erf, i — l,... ,n. Then, 



N 


E»'E 


u'=l 


i= 1 



We now enumerate few special cases of sum of random variables without proof, 
though in most cases playing with the characteristic function is enough to obtain 
the results. 


(i) the NIG distribution is closed under convolution in the following 
sense, i.e. if X ~ NIG(o!, f, <5i, p,\) and Y ~ NIG(a, f, <$ 2 , Fi) are two 
independent random variables, then 


X + Y ~ NIG(o!, 5i + 82 , F\ + M 2 ); 
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(ii) if Xj ~ Gamma (a,-, /I), i = l,...,n, is a sequence of independent 
Gamma random variables, then 

X\ + X 2 + ■ ■ ■ + X n ~ Gamma(o!i + ■ ■ ■ + oi n , /3); 

(iii) if Xi ~ N(/x ; -, ct 2 ), i = 1,... , n, is a sequence of independent Gaussian 
random variables, then 

x\ + x\ + ■ ■ ■ + X 2 n ~ x „ 2 05 ) 

where x, 2 (i$) i s the noncentral Chi-square random variable with n degrees 
of freedom and noncentral parameter S given by 



2.3.5 Infinitely divisible distributions 

Let X be a random variable with distribution function F X (x ) and characteristic 
function fx (u). We now introduce the notion of infinitely divisible laws which 
represent a class of random variables whose distributions are closed with respect 
to the convolution operator. This property will be particularly useful in financial 
applications in relation to Levy processes. 

Definition 2.3.12 The distribution Fx is infinitely divisible if, for all n e N, there 
exist i.i.d. random variables, say X { ^"\ X { ^ n) , ..., x\^"\ such that 

X ~ z{ 1/n) + X^ ,n) + ■ ■ ■ + x\l ln) . 

Equivalently, Fx is infinitely divisible if, for all n e N, there exists another law 
F X (i/n) such that 


Fx(x) — F x (l/n ) * F x (l/n) * ■ ■ ■ * F x (l/n), 


i.e. F x {x) is the n-times convolution of F x o/n). 

Not all random variables are infinitely divisible, but some notable cases are. We 
show here a few examples because this property come from very easy manipu¬ 
lation of the characteristic functions. Indeed, we will make use of the following 
characterization: is X has an infinitely divisible law, then 

<Px(u) = (tp X (\/n)(uj) n . 
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Example 2.3.13 (Gaussian case) The law of X ~ N {pi, a 2 ) is infinitely divisi¬ 
ble. Indeed, 


(px(u) — exp ( iu/x - u 2 o 2 


= exp I n 


iu/x 1 wo 2 


= exp 


2^2 


in IX 1 u~o 

n 2 n 


i 2 n 
— ( M )) 


with Xd/») ~ TV , £). 


Exercise 2.14 (Poisson case) Prove that the law of X ~ Poi(L) is infinitely 
divisible. 


We have seen in Example 2.3.10 that the sum of two uniform random variables, 
is not a rescaled uniform distribution (rather a triangular shaped distribution). 
Similarly, it is possible to prove that the Binomial distribution is not infinitely 
divisible, while the Gamma, the exponential, the Meixner and many others are. 

The following result (see e.g. Sato (1999), Lemma 7.8) characterizes infinitely 
divisible laws in terms of the so-called characteristic triplet. 

Theorem 2.3.14 The law of a random variable X is infinitely divisible if and only 
if there exists a triplet ( b , c, v), with b e R, c > 0 and a measure v(-) satisfying 
v({0}) = 0 and / R (l A |x| 2 )v(d.r) < oo, such that 


E 


{e'" x } = exp libu -— + f (e ,ux — 1 — injcl{|^|<i}) v(dx) 

[ 2 J R 


(2.16) 


Equation (2.16) is also known as the Levy-Khintchine formula. Notice that the 
characteristic triplet for X ~ N(fx, o 2 ) is (b — /x, c = er 2 , v) where v(A) = 0 
for all Ac®. For X ~ Poi(A), we can take v({x}) = 8(x — 1)L, where 8 is the 
Dirac delta function. Therefore, the characteristic triplet is (b — 0, c — 0, v). The 
exponent in Equation (2.16) 


9 

wc 


flu) — ibu -——|- 



1 - iux l{|jc|<i}) v(dx), 


is called Levy or characteristic exponent. 

2.3.6 Stable laws 

Also random variables with stable law are often used in finance jointly with Levy 
processes to model asset prices. The notion of stable law emerged in the study 
of the distribution of the sum of random variables and indeed, stability preserves 
the family of distributions with respect to sum. A very recent account on stable 
distribution and their applications is Nolan (2010). 
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Definition 2.3.15 A random variable X is stable or stable in the broad sense if 
for X\ and X 2 independent copies of X and positive constants a and b we have 

aXi + bX 2 ~ cX + d (2.17) 

for some positive c and some del. The random variable is strictly stable or 
stable in the narrow sense if (2.17) holds with d — 0. A random variable is 
symmetrically stable if it is stable and symmetrically distributed around 0, i.e. 
-X ~ X. 


Clearly, the Gaussian law is stable. 

Example 2.3.16 (Gaussian case) let X \ and X 2 be two independent copies of 
X ~ A(/r, cr 2 ), then 

aX 1 + bX 2 ~ N((a + b)pt, (, a 2 + b 2 )a 2 ) ~ cX + d 
with c 2 — a 2 + b 2 and d — p,(a + b — c). 

There are other two distributions which admit a closed form of the convolu¬ 
tion which are also stable, the Cauchy distribution and the Levy distribution 
Levy(y, 8) with two parameters y > 0 and 8 and density 

/( x) — J — - r exp |-—- | , 8 < x < 00 . 

V27T (x _ 5) l 2 {x-8))' 

There are other equivalent definitions of stable laws which are useful. The first 
one extends the definition to the sum of n random variables. 

Definition 2.3.17 A nondegenerate random variable X is stable if and only if for 
all n > 1, there exist constants c n > 0 and d n e M such that 

X 1 + ■ ■ ■ + X n ~ c n X + d n , 


where X \, ..., X n are i.i.d. as X. X is strictly stable if and only if it is stable and 
for all n > 1, d n — 0. 

The only possible choice of the scaling constant c n is c„ = n «, for some a e 
(0, 2], see Nolan (2010). We have already seen the central role of the charac¬ 
teristic function in the analysis of the sum of independent random variables. An 
equivalent definition is based indeed on the following construction. 


Definition 2.3.18 A random variable X is stable if and only if X ~ aZ + b, 
where y >0, Set and Z is a random variable with parameters (a, ft), — 1 < 
ft < 1 , a e (0, 2], such that 


<Pz(t) 


| exp { — \t\ a (l — if tan (a^-) sgn(n))} . a 1, 
[exp { —|f| (1 + iy6|sgn(M)log|n|)}, a = 1 


(2.18) 
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and hence 


<Px(t ) = 


I exp {itb 
exp [itb 


\at\ a (1 — ifi tan (otj) sgn(n ))}, a ^ 1, 
\at\ (1 + ifi^sgn(u) log |m|)} , a = 1. 


In the above, when a = 1, 01og(0) is taken as 0 and sgn(-) is the sign func¬ 
tion, i.e. 


sgn(x) = 


- 1 , 

0 , 

1 , 


x < 0, 

x = 0, 

x > 0. 


These random variables will be denoted by S (a, fi,y, 8), and they have a 
symmetric distribution when fi — 0 and 5 = 0, so that the characteristic function 
of Z in the definition takes the form: 


<Pztt) = e 




It is possible to check that 


S ( a = 2, fi = 0, y = —8 = n ) = N(/r, a ) 


S( a=~, p=l,y = y,8 = 8) = Levy (y, 5) 


S (a — l, p — 0, y — y, S — 8) — Cauchy(y, <5) 


Stable distributions are hence characterized by four parameters, where a is called 
index of stability or characteristic exponent , fi is the skewness parameter, y > 0 
is the scale parameter and S e M is the location parameter. Notice that y is 
called scale parameter and not standard deviation because even in the Gaus¬ 
sian case y — -j= ^ a , and the same for 5 which is only a location parameter 
and not the mean (indeed, the Cauchy distribution has no mean). The notation 
S (a, fi,y, 8) is sometimes written as S(a. fi. y, <$; 1) where the last ‘1’ denotes 
one kind of parametrization. This means that stable laws can be parametrized in 
several ways, and one should always check which version is used. In particular, 
the parametrization of this book, S(a, fi, y, 8) — S(a, fi, y, <5; 1), coincides with 


x~[ yZ + S ’ 

[yZ + 8 + fi^y logy, a = 1 

and the parametrization S(a, fi, y, 8; 0) with 

jz ( z -^tan(Q'f))+<5, a ± 1, 
[yZ + 8, a = 1, 

with Z as in (2.18). 
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x 


Figure 2.1 Shape of stable distributions S(a, ft = 0.5, y — 1,5 = 0), for 
a = 0.5, 0.75, 1, 1.25, 1.5. 


R functions to obtain density and cumulative distribution function; quantiles and 
random numbers for stable random variables are of the form [dpqr] stable and 
available in the contributed R package called fBasics. The functions allow for 
several parametrizations of stable distributions including the previous ones. The 
next code produces the plot in Figure 2.1 using the function dstabie which by 
default sets gamma = l and delta = 0. The argument pm denotes the parametrization 
in use. The plot shows the different shapes assumed by the stable distribution as 
a function of a for the stable law S(a, f> — 0.5, y — 1,5 = 0). 

R> require(fBasics) 

R> x <- seq(-10, 10, length = 500) 

R> yl <- function(x) dstabie(x, alpha = 0.5, beta = 0.5, pm = 1) 

R> y2 <- function(x) dstabie(x, alpha = 0.75, beta = 0.5, pm = 1) 

R> y3 <- function(x) dstabie(x, alpha = 1, beta - 0.5, pm = 1) 

R> y4 <- function(x) dstabie(x, alpha = 1.25, beta = 0.5, pm = 1) 

R> y5 <- function(x) dstabie(x, alpha = 1.5, beta = 0.5, pm = 1) 


R> 

curve (yl, 

-5, 5, 

lty = 

6, ylim - 

= c(0, 0.6) , ylab = 

R> 

curve (y2, 

-5, 5, 

lty = 

2, add = 

TRUE) 

R> 

curve (y3, 

-5, 5, 

lty = 

3, add = 

TRUE) 

R> 

curve (y4, 

-5, 5, 

lty = 

4, add = 

TRUE) 

R> 

curve (y5, 

-5, 5, 

lty = 

1, add = 

TRUE) 

R> 

legend(-4, 

0.6, 

1egend 

= c(expression(alpha == 0.5) 


expression(alpha 

== 




+ 0.75), expression(alpha == 1), expression(alpha == 1.25), 

+ expression(alpha == 1.5)), lty = c(6, 2, 3, 4, 1)) 

Similarly, next code shows the skewness of the stable distribution S(a = 1, 
y = 1 , 5 = 0) as a function of ft = —0.99, —0.3, 0, 0.5, 0.7. The resulting 
plot is given in Figure 2.2. 
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x 


Figure 2.2 Shape of stable distributions S(a — 1, f, y — 1, <5 = 0), for 
f = -0.99, -0.3, 0, 0.5, 0.7. 


R> require(fBasics) 
R> x <- seq(-10, 10, 


R> yl 
R> y2 
R> y3 
R> y4 
R> y5 


<- 

<- 

<- 

< 

. 


f und ion (x) 
function(x) 
function (x) 
function (x) 
function(x) 


length = 500) 
dstablefx, alpha 
dstablefx, alpha 
dstablefx, alpha 
dstable(x, alpha 
dstablefx, alpha 


1 , 

beta = -0.99, pm 

= 1 

1 , 

beta = -0.3, pm = 

: 1) 

1 , 

beta = 0, pm = 1) 


1 , 

beta = 0.5, pm = 

1) 

1 , 

beta = 0.7, pm = 

1) 


R> 

curve (yl, 

-3, 

3, 

lty = 6, 

ylim = c(0, 0.35), ylab = "dens it 

R> 

curve (y2, 

-3, 

3, 

lty = 2, 

add = TRUE) 

R> 

curve(y3, 

-3, 

3, 

lty = 1, 

add = TRUE) 

R> 

curve(y4, 

-3, 

3, 

lty = 4, 

add = TRUE) 

R> 

curve (y5, 

-3, 

3, 

lty = 3, 

add = TRUE) 

R> 

legend(-3, 0.33, 
expression(beta 

legend = 

c(expression(beta == -0.99), 

+ 

-0.3), expression(beta 
expression(beta == 

== 0), expression(beta == 0.5), 

+ 

0.7)), 

lty 

= 

c (6, 2, 1, 

■ 4, 3)) 


2.3.7 Fast Fourier Transform 

We have seen the central role of characteristic functions and we know that each 
random variable has one and only one characteristic function and vice versa. 
The characteristic function of random variables X with density /(■) can be 
expressed as 


(p(t) = E 


/ OO 

-oo 


f(x)e itx dx. 


(2.19) 










PROBABILITY, RANDOM VARIABLES AND STATISTICS 


43 


It is always possible to recover the density or the distribution function of a 
random variable from its characteristic function as the following theorem shows 
(see e.g. Kendall and Stuart 1977). 

Theorem 2.3.19 (Inversion theorem) Let (pit ) be the characteristic function of 
a random variable with distribution function F(x) and density function fix). 
Then, 


Fix) 


1 i r c 

2 ~ 2 n J_ ( 


’‘(Pit) 


it 


At = Fi 0) 


-If 

2 jt J 


It 


-(pit)dt 


and 


fix) 


=—r 

2 n J-< 


‘(pit)dt. 


( 2 . 20 ) 


Although the previous theorem established a direct link between <p(-), F(-) and 
/(■), closed form results are rarely obtained and in most cases the solution is 
obtained by numerical methods. In particular, numerical approximation of the 
integral (2.20) has to be calculated by some quadrature formula which is based 
on the discretization along the integration variable t, for each given x. So to 
obtain the shape of the function fix) in (2.20) one also needs to discretize the x 
axis. Assuming that N points are chosen for the x grid and N for the t grid, this 
numerical problem requires at least N 2 operations to be performed. In this respect, 
one of the most important advances in numerical analysis was the Fast Fourier 
Transform (FFT) algorithm by Cooley and Tukey (1965). Although there is no 
need to know the details, it is worth understanding that the merit of this algorithm 
is to reduce the computational burden from order N 2 to order N log 2 (/V), which 
is the reason for the adjective ‘fast’. Indeed, if N — 1000, then N 2 — 1, 000, 000 
but N log 2 (A0 = 9, 965.784. This algorithm allows us to calculate a discrete 
version of the Fourier transform or the inverse of a given discretized Fourier 
transform. For a given vector of complex numbers x n , n = 1,..., N, the FFT 
algorithm efficiently calculates the discrete Fourier transform 


N 

X k = Y^x n e- i ^- (k - V)(n ~ l \ for k = l,..., N (2.21) 

n= 1 

and, starting from the sequence X k , the values of the inverse of the discrete 
Fourier transform 


x n = — ^ x k e r ^ {k ~ l)in ~ l) , for n = 1,..., N. (2.22) 

V k= 1 

The expression in (2.22) is close to a quadrature formula for the integral (2.19) 
and (2.21) is close to the quadrature of (2.20). In order to see exact correspon¬ 
dence between the formulas, we need to manipulate them, but first we notice the 
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(direct) FFT corresponds to the inversion formula for the characteristic function 
(2.20), while inverse FFT corresponds to the characteristic function (2.19). Let 
g(t) = e~ ltx cp(t), then 

g(~t) = e ltx <p(-t) = e itx <p(t) and g(t) + g(-t) = 2Re [e~ Ux (p{t)) . 

We can rewrite (2.20) as follows: 

1 r°° i r° i r°° 

f(x) = — e ,tx (p(t)dt = — g(t)dt + — g{t)dt 
^ J-o o J-o o ^7T Jo 

— Re j — J e~ ltx (p(t)dt . 

We now discretize the last integral using a grid of points t„ — A ,{n — 1), n = 
1,..., N, so that in practice we evaluate the integral on the interval [0, T — 
t N = (N — l)A f ] instead of the interval (0, +oo). If N is relatively big, then the 
truncation of the integral will not affect the approximation too much. We obtain 
the following approximation: 


{ 1 

~^2 e ~ itnX <P(tn) A, • 

n =1 

The value of A f , which is crucial, will be specified later. Now, assume that the 
function f(x) has a finite support [x m j n , x max ] and we set x mm = 0 to simplify 
the exposition. Then take A v = (x max — n )/(N — 1) and set Xk = x m ; n + A t 

(k — 1) = A x (k — I), for k = 1,..., N. Therefore, for each we have 



The crucial position is now to impose the condition A x A t — jt, which implies 

2n N — l 2 n 

A, = -« - 

N X max X [T1 j n X max X m i n 

for large N. So finally we have 


/(**) ~ Re 


N 


n= 1 




Taking X * = f(xk) and x„ = (p(t „) we obtain, up to normalizing constants, a 
version of (2.21). Similar manipulation allows to express (2.22). All those manip¬ 
ulations are made by the software interface, so now we explain how to execute 
FFT (2.22) and its inverse (2.21) with R . The algorithm for the FFT is designed 
to efficiently calculate expressions of the form (2.21). So, if we want to obtain 
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the characteristic function from the FFT transform we need to use the inverse 
of the FFT. R is not special in this sense because the function fft calculates 
exactly (2.21), so to obtain the characteristic function we need to use the argument 
inverse = true (the default being false) in the function fft. Indeed, R just 
executes the same algorithm fft plugging a sign “+’ in the exponential. Notice 
that the normalizing factor l/N is missing. Assume we take the density /(■) 
of the standard Gaussian density N( 0, 1) and let us calculate the characteristic 
function on a grid of points over the interval (—3, 3). 

R> x <- seq(-3, 3, length = 20) 

R> f <- function(x) dnorm(x) 

R> f(x) 

[1] 0.004431848 0.010873446 0.024145731 0.048529339 0.088279375 
0.145346632 

[7] 0.216591572 0.292125176 0.356604876 0.394000182 0.394000182 
0.356604876 

[13] 0.292125176 0.216591572 0.145346632 0.088279375 0.048529339 
0.024145731 

[19] 0.010873446 0.004431848 


Then, we calculate the characteristic function using the R function ftt 


R> y <- fft(f(x), inverse = TRUE) 
R> y 

[1] 3.161856e+00+0.000000e+00i 

[3] 4.125760e-01-1.340541e-01i 

[5] -6.619828e-04+4.809586e-04i 
[7] -5.609033e-04+7.720172e-04i 
[9] -1.211024e-04+3.727148e-04i 
[11] -2.220446e-16+0.000000e+00i 
[13] -1.211024e-04-3.727148e-04i 
[15] -5.609033e-04-7.720172e-04i 
[17] -6.619828e-04-4.809586e-04i 
[19] 4.125760e-01+1.340541e-01i 


-1.911250e+00+3.027123e-01i 
-3,528574e-02+1.797898e-02i 
-9.869481e-04+9.869481e-04i 
-2.895818e-04+5.683363e-04i 
-2.920041e-05+1.843641e-04i 
-2.920041e-05-1.843641e-04i 
-2.895818e-04-5.683363e-04i 
-9.869481e-04-9.869481e-04i 
-3,528574e-02-1.797898e-02i 
-1.911250e+00-3.027123e-01i 


Now y is the output of the FFT algorithm in (2.22) without the normalizing factor 
l/N. Therefore, to get density back from the FFT we proceed as follows: 


R> invFFT <- as.numeric(fft(y)/length(y)) 

R> invFFT 

[1] 0.004431848 0.010873446 0.024145731 0.048529339 0.088279375 
0.145346632 

[7] 0.216591572 0.292125176 0.356604876 0.394000182 0.394000182 
0.356604876 

[13] 0.292125176 0.216591572 0.145346632 0.088279375 0.048529339 
0.024145731 

[19] 0.010873446 0.004431848 
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where as.numeric transform the complex vector into one of real numbers by 
dropping the imaginary part. We will discuss in more depth the use of the inverse 
Fourier transform as an alternative to the Monte Carlo method, or the exact 
formulae, for the calculation of option prices, in Section 8.1.5. 


2.3.8 Inequalities 

There are some fundamental inequalities in the calculus of probability which are 
often used in proofs so we collect them here as a reference. We will review some 
of them, providing only hints for the proof of some of them. 

Theorem 2.3.20 (Chebyshev’s inequality) Let X be a random variable with 
expected value E(X) = /x and let e > 0 be any positive real number. Then 


P(\X-n\>€) < 


Var(X) 


(2.23) 


Proof. Remember that we can always write Var( X ) = f R \x — /x| 2 d F(x). 
Because this is a sum of positive terms, if we restrict the summation to the set 
on which \X — p.\ > e we have that 


Var(X) > [ l { p_^> e) |x-/x| 2 d F(x). 
Jr 


The proof ends by noting that, on this subset of R, all terms are at least e and 
we have that 

f 1{|jc— /i|>e) \ x ~ fi\ 2 dF (x) >e 2 [ l { „-, t |> e) dF(x) = e 2 P(\X - n\ > e). 

Jr Jr 

A generalization of this inequality, called Chebyshev-Markov inequality, is the 
following: 

E|X| p 

P(\X\ >e)< (2.24) 

ef 

which is well defined when all quantities exist. Similarly we have the Chebyshev- 
Cantelli inequality: 


/ > (Z-E(Z)>e) < 


Var(X) 
Var(Z) + e 2 ' 


Theorem 2.3.21 (Lyapunov’s inequality) Let X be a random variable with 
finite moments up to order r. Then 

(E|X| p )p < (E|XDs for 0 <p<r. 
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Theorem 2.3.22 (Cauchy-Schwarz-Bunyakovsky inequality) Let X and Y be 

two square integrable random variables, then 

|E(XT)| < VE(Z z )E (Y 2 ). 

Theorem 2.3.23 (Holder inequality) Let p.q e (l,oo) and such that -^+ 

, I 

4 = 1. Let X and Y be two random variables such that (E|X| p )r < oo and 
(EIZI 9 )? < oo, then 

|E(XY)| < (E|X| p )p(E|F| ,? )?. 

Definition 2.3.24 A function /(■), / : R —>• R, is said to be convex if for any 
(jc i, X2, ..., x n ) £ R" and non-negative constants a\, a2, ■ ■ ■, a n then 

( n \ n 

I < ^ atfixt ). 

(=i / ;=i 

Theorem 2.3.25 (Jensen’s inequality) Let /(■) be any real-valued convex 
function on R and X a random variable with finite expectation. Then 

f(E{X}) < E{f(X)}. 

Proof We present the proof for discrete random variables. Let X be a discrete 
random variable taking a finite number of values, then by definition of convex 
function, with a,- = P(X — xfi = px(xi), we have 

( n \ n 

y^xj pxixj) j < f( x i)px(*i) =E{/(z)j. 

i=i / i=i 

From Jensen’s inequality it immediately follows that |E(X)| < EY |. 

Theorem 2.3.26 Let X be a random variable and /'(■). g (■) monotone nonde¬ 
creasing measurable functions. Then 

E(f(X)g(X)) >E{f(X)Eg(X)}, 

provided all expectations are finite. If /(•) is monotone increasing and g(-) mono¬ 
tone decreasing, then 


E (f(X)g(X)) < E{/(X)}E{g(X)}. 

Theorem 2.3.27 (Kolmogorov inequality) Let X ir i — 1,2,..., n be indepen¬ 
dent random variables, with E(Z, ) = 0 and E(X 2 ) < oo, then 

p( max |Z 1 + Y 2 + --- + Y*| > e) < \ f"E (xf), 

\l<i'<n J € 


for all e > 0. 
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2.4 Asymptotics 

Sequences of random variables can be defined in a natural way, but because of 
their structure, convergence is intended in a slightly different way from what is 
usually the case in basic calculus courses. In particular, measurability and distri¬ 
butions of these random objects define different types of convergence. Finally, 
for what matters to statistics and finance, some particular sequences have very 
peculiar limits as we will see. 

2.4.1 Types of convergences 

Definition 2.4.1 (Convergence in distribution) Let [F n ,n e N} be a sequence 
of distribution functions for the sequence of random variables {X n ,n e N}. 
Assume that 

lim F n (x) = F x (x) 

n—> oo 

for all reR such that F x f) is continuous in x, where Fx is the distribution 
function of some random variable X. Then, the sequence X n is said to converge 
in distribution to the random variable X, and this is denoted by X n —> X. 

This convergence only means that the distributions F n of the random variables 
converge to another distribution F, but nothing is said about the random vari¬ 
ables. So this convergence is only about the probabilistic behavior of the random 
variables on some intervals (—oo, x], x e R. 

Definition 2.4.2 (weak convergence) A sequence of random variables X n 
weakly converges to X if for all smooth functions /(■), we have 

lim f f(x)AF n (x) = { f(x)dF x (x) 

II >oo J R JR 


and we write X n X. 

Theorem 2.4.3 A sequence of random variables X „ weakly converges to X if and 
only if it also converge in distribution. 

So previous results say that there is no difference between weak convergence 
and convergences in distribution, so one of the two can be used as a criterion to 
prove the other. 

Definition 2.4.4 (Convergence in probability) A sequence of random variables 
X n is said to converge in probability to a random variable X if for any e > 0, the 
following limit holds true 


lim P(\X n - X\ > e) = 0. 


This is denoted by X, 


p 


X and it is the pointwise convergence of probabilities. 
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This convergence implies the convergence in distribution. Sometimes we use the 
notation 

p-hm \X„ - X\ = 0 

n—>oo 

for the convergence in probability. A stronger type of convergence is defined 
as the probability of the limit in the sense POinin^ooZ,, = X) = 1 or, more 
precisely, 

Definition 2.4.5 (Almost sure convergence) The sequence X n is said to con¬ 
verge almost surely to X if 

lim X n (a>) = X(w)}) = 1. 

When this happens we write X n a —7 X. 

Almost sure convergence implies convergence in probability. 

Definition 2.4.6 (r-th mean convergence) A sequence of random variables X„ 
is said to converge in the r-th mean to a random variable X if 

lim E\X n - X\ r = 0, r > 1. 

n—>oo 


U 

and we write X n —>• X. 

The convergence in the r-th mean implies the convergence in probability thanks 
to Chebyshev’s inequality, and if X„ converges to X in the r-th mean, then it also 
converges in the s-th mean for all r > s > 1. Mean square convergence is a par¬ 
ticular case of interest for stochastic analysis and corresponds to the case r = 2. 

Theorem 2.4.7 If for each e > 0 we have YJnL i P (I X n — X\ > e) < oo, then 
X n X. 


Example 2.4.8 Let X ~ U(0, 1) and consider the sequence 


X n (fl)) = 


[o, 


o <X(co)<±, 
< X(a>) < 1 , 


for n = 1,2,.... Then 


^P (\ x „ 

n= 1 


-x\ > 


osE 


< oo 


for any e > 0, because \X n — X\ >0 on the interval (0, 1 /n 2 ), hence P(\X n — 
X\ > e) — P(X < l/n 2 ) — 1 /n 2 . Therefore, X n X by Theorem 2.4.7. 
Moreover, 

E|Z„ — X\ 2 = Px 2 dr = -L->0. 

Jo 3n 6 

Then X„ converges also in quadratic mean to X. 
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The following implications =>■ hold in general: 


dfw 


Vr > 0 : 


Vr > s > 1 : 


Further, if a sequence of random variables X n converges in distribution to some 
constant c < oo, i.e. X n 4 c, then it also converges in probability to the same 

p 

constant, i.e. X n —>• c. 

Theorem 2.4.9 (Slutsky’s) Let X n and Y n be two sequences of random variables 
such that X n —»• X and Y n —► c, where X is a random variable and c a constant. 
Then 

X n + Y n 4 X + c and X n Y n 4 cX. 

Theorem 2.4.10 (Continuous mapping theorem) // X„ 4 X and h(-) is 

continuous function such that the probability ofX taking values where hf) is dis¬ 
continuous is zero, then h(X„) -> h(X). 


2.4.2 Law of large numbers 


Theorem 2.4.11 Let { X n , n — be a sequence of independent and iden¬ 

tically distributed random variables with E(X„) — p < oo and Var(X„) = cr 2 
< oo for all n. Let S n = ^" =1 X,-. Then 


Sn 


n 


p. 


Proof. We simply need to use Chebyshev’s inequality (2.23). In fact, take 
e > 0 


P 





as 


oo. 


It is also possible to remove the requirement on the finiteness of the second 
moment using a different proof based on characteristic functions. 

Theorem 2.4.12 Let { X n , n = 1, ...} be a sequence of independent and identi- 

S d 

cally distributed random variables with E(X„) — p < oo. Then — p. 
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Proof. We make use of the properties of the characteristic functions. In 
particular we use (2.7) with a, — X/n. Let cpx(t) denote the characteristic function 
of Xj and <p n (i) the characteristic function of S n /n. Then 


<Pn (0 = 



We now use Taylor expansion and obtain cp(t ) = 1 + i<p'(t)t + o(t), hence 


<P 



IfJLt 

= 1 + — +o 
n 



Therefore, remembering that lim„^oo(l — a/nf = e a , we get 


lim <p„(t) = e' M? , 


which is the characteristic function of degenerate random variable taking only 

o d S P 

the value ji. Then we have proved that — —>• p. and hence also —> p. 

Previous results concern only the distributional properties of the arithmetic 
mean of independent random variables so they are called weak law of large 
numbers but, we can notice that, for any n we have 

E ('Ll = = 

\ n ) n 

so we can expect a stronger result. This is indeed the case and the limit theorem 
is called strong law of large numbers. 

Theorem 2.4.13 (Strong L.L.N.) Let {X n , n = 1,...} be a sequence of indepen¬ 
dent and identically distributed random variables with E|X„| 4 — M < oo. Then 

c a.s. 

— 11. 

n ^ 

To prove the strong law of large numbers we need the following Lemma without 
proof. 

Lemma 2.4.14 (Borel-Cantelli) Let A n be a sequence of events. If 

E P(A n ) < oo then P feu : lim 1 a „( co ) — ()] = 1. 

1 n-> oo I 


If the events A n are mutually independent, the converse is also true. 

Loosely speaking, Borel-Cantelli Lemma says that if, for a sequence of events 
A n , the series of probability is convergent, then in the limit the events A„ will 
not occur with probability one. 
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Proof, [of Theorem 2.4.13] We assume E(X,-) = 0, otherwise we just 
consider Yj — Xj — p and then proceed with the same proof. Simple calculation 
gives 

E {S,?} = nE {Xf} + 3 n(n - 1) (E {X 2 )f <nM + 3 nV. 


Now we use Chebyshev-Markov’s inequality (2.24) with power four and obtain, 
for any e >0, that 


( ^ >e) = P(\S n \>ne) 


So, the events A n 




are such that 


OO 

< oo 

n =1 

and applying Borel-Cantelli Lemma we get almost user convergence. 


2.4.3 Central limit theorem 


Theorem 2.4.15 Let { X n , n = 1,...} be a sequence of independent and iden¬ 
tically distributed random variables with E(X„) — p. < oo and Var(X„) = cr 2 
< oo for all n. Let Y n — ^ J]" =1 X, be a new sequence defined for each n. Then 


JfXi- np 

Y n P i =1 d 




a Jn 


N(0, 1). 


Proof We prove the result by using the convergence of the characteristic 
functions which corresponds to convergence in distribution. It is easier to obtain 
the result if we notice that Z, — Xj — p are such that E(Z,-) = 0 and Var(Z, ) = 
E(Z 2 ) = cr 2 . Then 

” Z,- Y n - p 


*n = Y. 


i=l 


y/n 


Let us denote by (pz{t) the characteristic function of the random variables Z,-, 
then by (2.7) with a,- = we have 


n 

<Ps n (t ) = Y\<Pz 

i =1 
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Now we expand (p(i) in powers of t 



= 1 - t -cr 2 + o(t 2 ). 


Hence 



Finally, we obtain 



n 


lint 1-ho 

«—mx> 2 n 


which is the characteristic function of the random variable N(0, 1). 

Next result, given without proof, involves the infinitely divisible distributions. 

Theorem 2.4.16 Let [F n ,n > 0} a sequence of infinitely divisible distributions 
and assume that F n (x) -> F(x). Then Ff) is also infinitely divisible. 

This result ensures that the limit keeps the same property of being infinitely 
divisible as the elements of the sequence. 

Theorem 2.4.17 (Lindeberg’s condition) Let {X n , n = 1,...} be a sequence of 
independent random variables such that E(X„) = p n < oo, Var(X„) = er “ < oo 
and let s 2 — Yll=\ a b V’ f or an y e > 0, 



« k= l 


then the Central Limit Theorem holds, i.e. 



Lindeberg’s condition is sufficient but not necessary and it is a very useful tool 
to prove asymptotic normality of, e.g., estimators in statistics. This condition 
guarantees that, for large n, the contribution of each random variable to the total 
variance is small. 
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2.5 Conditional expectation 


As seen in Section 2.1.1, the conditional probability of A given B is debned 
as P(A\B) — P(A n B)/P{B). In the same way, it is possible to introduce the 
conditional distribution of a random variable X with respect to the event B as 


Fx(x\B) = 


P((X < x) 0 B) 
P(B) 


rcl, 


and the expectation with respect to this conditional distribution is naturally intro¬ 
duced as (see Mikosch (1998) for a similar treatise) 


E{X|P} = E(Zlfi) = —— [ X(co)l B (co)P(dco) 

P(B) P(B) Jq 

= —/ X(co)l B (co)P(dco) = f X(co)P(Aco), 

P\B) Jbub P(B) J b 

where \ B is the indicator function of the set B c £2, which means 1 B (co) = 1 if 
co e B and 0 otherwise. For discrete random variables, the conditional expectation 
takes the form: 


E{X|5} = £*,- 


P([co : X(co) = Xj }nB) 
P(B) 


^x,P(Z = .r,|fi). 


For continuous random variables with density fx(-), if we denote X(B) — [x = 
X(co) : co e B ) we can write 

E(X1 B ) If If 

E{Z|51 = p/ R > = / xl lx(B)}(x)fx(x)dx = —— / xf x (x)dx, 

P\B) P\B) P(B) Jx(B) 

where lj X (B)}( x ) =1 if x e X(B) and 0 otherwise. Consider now a discrete 
random variable Y that takes distinct values yi, yi, ■ ■ ■, Yk an d define A, = [co : 
Y(co) = y/} = y _1 (y,), i — \k. Assume that all P(A,) are positive. Let 
E|X| < oo. Then a new random variable Z can be debned as follows: 


Z(co) — E{X|F}(m) where E{X|F}(c<j) = E{X|F(w) = y,} 
= E{X|A,} for co e Aj. 


For all co e A, the conditional expectation E(X| Y\(o>) coincides with E{X|A,} = 
E(X1 a,), but, as a function of we C, it is a random variable itself because it 
depends on the events generated by Y (co) and each value taken by the conditional 
expectation E{X|F}(w), i.e. each E{X|F}(&>) = E{X|A,} has its own probability 
P(X = yt). 

If instead of a single set B or a bnite number of sets A,, i — 1 ,,k, we 
consider a complete o -algebra of events (for example, the one generated by a 
generic random variable Y), we arrive at the general debnition of conditional 
expectation which is given in implicit form as follows. 
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Definition 2.5.1 Let X be a random variable such that E| X < oo. A random 
variable Z is called the conditional expectation ofX with respect to the a-algebra 
F if: 


(i) Z is F-measurable and 

(ii) Z is such that E (ZIa) — E (XIa) for every A e F. 


The conditional expectation is unique and will be denoted as Z = E{X| J 7 ]. With 
this notation, the equivalence above can be written as 

E(E{X|.F} 1 A ) = E(X1 a ) for every A e IF. (2.25) 

By definition the conditional expectation is a random variable and the above 
equality is only true up to null-measure sets. Among the properties of the condi¬ 
tional expectation, we note only the following. Let X and Y be random variables 
and a , b two constants. Then, provided that each quantity exists, we have 

(i) linearity 

E {a-X + b- Y\F\ = a ■ E{Z| F] + b ■ E{Y\F)\ 

(ii) if we condition with respect to the trivial cr -algebra, Fq = (LL 0}, we 
have 

E{X|iFo} = E(Z); 

(iii) if Y is ^"-measurable, then 

E{F ■ X\F] = Y ■ E{Z|JF}; 

(iv) choose X — 1 in iii), then it follows that 

E{T|j r } = Y ; 

(v) choose A = L2 in (2.25), then it follows that 

E(E{Z|j r }) = E(X); (2.26) 

(vi) if X is independent of J-, then it follows that 

E{Z| F] = E(X) 

and, in particular, if X and Y are independent, we have E{X\Y\ = 
EjAlcrlF)} = E(X), where a(Y) is the a -algebra generated by the ran¬ 
dom variable F. 
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Another interesting property of the conditional expectation which is often used 
in statistics is the next one which we give with detailed proof because we will 
make use most of the above properties. 

Theorem 2.5.2 Let X be a square integrable random variable. Then 
Var(A) = Var(E{Z|T}) + E(Var{Z|T}). 


Proof. 


Var(X) = E(X - E(X)) 2 = E{E(X - E(X)) 2 |y} 


= E1E(X —E{X|y} + E{X|T) -E(X)r 


= E|E(X -E[X|y]) 2 +E(E{A|y} -E(X)) 2 


+ 2E[(X-E{A|y})(E{Z|y) — E(X))] 


= E{E(z - E{Z|y}) 2 |y) + E{E(E{A|y} - E(Z)) 2 |y} 
+ 2E{E[(X - E{A|yj)(E{Z|y) - E(Z))]|y} 

— a + b + c. 


Consider the first term and notice that 

E{(X - E{X|y}) 2 |y) = Var{X|y} 

where Var{ A| Y\ is the variance of X calculated using the conditional distribution 
function of X given Y. Hence a — E(Var(A Y\). Now, consider the second term. 
By measurability of E{A|y} — E(X) w.r.t. Y we have 

b = E{E(E{X|y) - E(Z)) 2 |y} = E(E{X|y} - E(Z)) 2 

and using (2.26) we get 

b = E(E{AC| y } - E(A)) 2 = E{E{Z|yj - E(E{A|y})} 2 = Var(E{X|y}). 

For the last term, using again the measurability of E{X\Y\ — E(X), we get 

c = 2E{E[(A -E{Z|y})(E{A|y} -E(Z))]|y} 

= 2(E{X|F) — E(X))E{E(X -E{Z|y})|y} 

= 2(E{X|y} -E(X))(E{X|y} — E{AC|y}) = 0. 

Finally, we present a version of Jensen’s inequality for the conditional expecta¬ 
tion, which is given without proof. 
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Theorem 2.5.3 (Conditional Jensen’s inequality) Let X be a random variable 
on the probability space (f2, T, P ), and let /(■) be any real-valued convex func¬ 
tion on ffi. Assume that X is such that E| X\ < oo and E|/(Y)| < oo. Then 


f(E{X\lF}) < E{/(Z)|JF[. 


(2.27) 


2.6 Statistics 

Suppose there is a population of individuals on which we want to measure some 
quantity of interest. Let us denote by X the random variable which describes this 
characteristic in the population. Assume that the distribution of X is characterized 
by some parameter 9 c © C The object of statistical inference is to recover 
the value of 9 from a random sample of data extracted from the population. We 
denote the random sample with the random vector (Xi, X 2 ,..., X n ) where Y, is 
the i'-th potential observed value on individual i of the random sample. Each X, 
is supposed to be extracted independently from the population X so that all the 
Xj ’s are copies of X, i.e. they have all the same distribution of X. In this case 
we say that (X \, Xi ,..., X n ) is a random sample of independent and identically 
distributed ( i.i.d .) random variables. 

Definition 2.6.1 An estimator T n of 9 is a function of the random sample which 
takes values on 0 but it is not a function of6e@ itself We write 


T n (co) = T n (co, (X U X 2 ,..., X„)) 


Example 2.6.2 Let X,, i = 1, ..., n be an i.i.d. random sample with common 
distribution with mean E(Y) = 9. The so-called plug-in estimator of 9 is the 
sample mean, i.e. 



i=l 


which we expect to be a ‘good’ estimator of 6. 

2.6.1 Properties of estimators 

Before judging estimators we need to discuss about quality measures. We denote 
by Ee(Y) the expected value of X under the distribution Pg, i.e. Eg(X) = 
f Q X(a>)Pg(dco). The probability measure Pg is supposed to be a member of 
the parametric family of models {Pg,9 e 0}. More precisely, to each sample 
size « > 1 we should consider a parametric family of models {Pg ,9 e 0} and 
then introduce the so-called family of experiments {LL Al, ( P ", 0 e 0)), but for 
simplicity we drop index n from the probability measures and we assume this 
family of experiments is defined somehow. 

Consider again Example 2.6.2. To each 9 in the family Pg we can 
associate, e.g., the distribution of the Gaussian random variable N(/r, a 2 ) with, 
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e.g. 9 — (/z, o' 2 ) or 6 = ji lor a given value of a 2 , or the one of the Bernoulli 
random variable Ber(0) with 9 — p. 

Definition 2.6.3 An estimator of 6 is said to be unbiased if 

E e(T n ) — 0, for all n 
and asymptotically unbiased if 


lim E e (T n ) = 6. 

n —>oo 

So, an unbiased estimator T n recovers the true value 6 on average. On average 
means that, from random sample to random sample, we can get different values 
of our estimator, but on the average of all possible random samples, it has a nice 
behaviour. If this fact happens independently of the sample size n, then we call 
these estimators unbiased, but if this is true only for large samples, we have only 
asymptotically unbiased estimators. 

Knowing that, on average, an estimator correctly recovers the true value is 
not enough if the variability from sample to sample is too high. So we also need 
to control for the variability. 

Definition 2.6.4 The mean square error (MSE) of an estimator T n of 9 is 
defined as 

MSE e (r„) = E 0 (r„-0) 2 . 

By adding and subtracting the quantity E g(T n ) in the formula of MSEg, we obtain 
an expression of the MSE which involves both the bias , defined as Bias^E,,) = 
Eg(7), — 9), and the variance Var e(T n ) of the estimator 

MSEg(r„) = Eg(7), ± E e (T„) - 9) 2 

= Eg (7), - Eg(7),)) 2 + Eg (Eg (7),) - 9) 2 

+ 2Eg(r„ - Eg(7;))(Ee(r„) - 9) 

= Varg(7’„) + (Biasg(r„)) 2 + 2(Eg (T n ) - 9)E e (T„ - Eg (T n )) 

= Varg(r„) + (Biasg(r n )) 2 . 

So, given two estimators T n and S n , if we want to compare them we need to 
evaluate both bias and variance of each at the same time and then: choose T n if 
MSEC?),) < MSECS’,,) or choose S„ otherwise. 

Example 2.6.5 Let us consider again X„ from Example 2.6.2. 

MSE(1„) = Varg(!„) + (Biasg(l„)) 2 


but 



1 A 1 A n9 

-VEg (X,) = -V0 = —= 0. 

n 1 n 

i= 1 i=l 


Eg(X„)=Eg 


n 
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Hence, X n is an unbiased estimator of 6 and the mean square error is then just 
its variance. So we calculate it now. 

/ 1 - \ 1 no 2 a 2 

Var e (X„) = Var 9 -X„ = — V Var e (Z ; ) = — = —. 

\n / n L z —' n- n 

v 7 ;=i 


_ 2 

Therefore we have MSEfl(X„) = Let us consider now an estimator T n of the 
following form: 


Tn 


3X\ + X 2 + ■ ■ ■ + X n ~i — X n 


n 


We have that T n is also unbiased. Indeed, 


W„) = 


3 6 + (n- 2)6 - 6 
n 



n 


Let us calculate the variance of T n 


Vwc 9 (T n ) = Var e 


3Xi + Z 2 + ■ ■ ■ + X„-i - X n 


3 2 a 2 + (n — 2)o 2 + a 2 (8 + n)cr 2 

n 2 n 2 

8 cr 2 a 2 a 2 

= — 5 - H-> — = Var e (Z„). 

n n n 


Therefore, MSEe(7’„) > MSEg(Z„) and we should prefer X„ to T n . 

Exercise 2.15 Let T n — ~ ^]" = i a iXi, with Yl'i=i a i — n > with X, i.i.d. random 
variables with mean 6 and variance a 2 . Prove that T n is an unbiased estimator 
of 6 and that the minimal variance ofT n is obtained for aj — 1 , i — 1 i.e. 

T n - X n . 

The previous exercise shows that the sample mean X n is, in terms of the mean 
square error, the best estimator of the mean of the population 9 among all esti¬ 
mators which are linear combinations of the X ,■ and it is also unbiased. This is 
a special case of estimators called best linear unbiased estimator (BLUE). 

We have also used the plug-in approach to define an estimator of 9, i.e. 
to estimate the mean of the population we used the empirical mean. But plug¬ 
in estimators are not necessarily the best ones. Suppose now we also want to 
estimate the variance a 2 and consider the empirical variance as an estimator 
of a 2 . Because we also estimate /1 we write E g with 9 — (ji. o 2 ) to denote 
expectation under the true unknown model. The empirical variance is defined as 


- ]T(Y, - X n ) 2 . 

n L ' 

1=1 







60 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


Then 

1 " 

E 0 (S 2 ) = -]TE e (X ; - -X n ) 2 

n i =i 

but 


EgiXj - X,,) 2 = E e (X 2 ) + E 9 (X„) 2 - 2E e (XiX n ) 


— /x“ + cr~ + /x H- 2Eq 


^ + -Z x j 

11 11 • J 




n + 1 \ ur + a 2 1 n — 1 

" 1 + /X 2 


= 2/x 2 + a 2 


<•) •-) VI “t - 1 ^ 

2/x 2 + a 2 —— - lii 2 -2— = 


n n 

a 2 n — 1 
n n 


a 2 . 


So E^S 2 ) = —^—cr 2 < a 2 and S 2 is a biased estimator of a 2 . We can correct the 
estimator as follows: 


i=l 


n — 1 


S 2 and hence E e (S’“) = o~. 


For completeness we mention that 


Var e (S 2 ) = 


2cr 4 
n — 1 


? 2 (n - 1)<t 4 

and Var^S 2 ) = —- 5 - 


The proof of this fact is simple algebra but very lengthy and we omit it. But, 
we remark that, if we compare the mean square error of the two estimators we 
obtain 

MSEgQS 2 ) = ^irr 4 2 n 2 

MSECS' 2 ) 2n 2 - 3n + 1 > ’ 

n n A 

which shows that, in the trade-off between bias and variance, the estimator S 2 is 
better than its unconnected version. Of course, those differences vanish asymp¬ 
totically. 


Definition 2.6.6 An estimator T n of 9 is said to be consistent if for all e >0, we 
have that 

lim P g (\T„-9\ >e) = 0. 

ft —> OO 


Clearly, by Chebyshev’s inequality (2.23) one can usually prove consistency 
using mean square error 


Pe(\T«-9\>e) < 


E g(T n -9) 2 
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and for unbiased estimators, one just need to check that the variance of the 
estimator converges to zero. Consistent estimators are convenient also because, by 
using the continuous mapping theorem 2.4.10, one has that, if g(-) is a continuous 
function and T n is a consistent estimator of 9, then g(T„) is a consistent estimator 
of g{9). 

In i.i.d. sampling when all conditions are fulfilled, one can immediately prove 
that the sample mean X„ is a consistent estimator of the mean of the population 
using either the law of large numbers or, alternatively, Chebyshev’s inequality 
recalling that Ee(Z„ — 9) 2 — a 2 /n. 

2.6.2 The likelihood function 

Suppose we have a sample of i.i.d. observations X,, i — \.n, with common 
distribution indexed by some parameter 9 e 0. Seen as a random vector, the 
sample (X\, X 2 ,..., X n ) has its own probability. So, for a given set of observed 
values {x \, X 2 , ..., x n ) from the random vector (X Xj, ..., X n ), we might won¬ 
der what is the probability that these data come from a given model specified 
by 9. Assume that the Z, ’s are discrete random variables with probability mass 
function p{x ; 9) = Pg(X = x). Let us construct the probability of the observed 
sample as 


n 

P$(x 1 = Xi, X 2 = x 2 , - x n = x„) = ]~~[ p{xp, 9). 

i =1 

Seen as a function of 9 e 0 and given the observed values (X \ = x\, AS = 
X 2 , ■■■ ,X n = x n ), this quantity if called the ‘likelihood of 9 given the sample 
data’ and we write 


n 

L n (9 ) = L n (0\xi ,..., x n ) = J~[ p(xp, 9). 

i =1 

In case of continuous random variables with density function / (x ; 9) we denote 
the likelihood as 


n 

L n {9) = L n {9\x u x„) = ]~[ fixp 9). 

i =1 

Now recall that f(x)^P(X= x) — 0 for continuous random variables, so it is 
important to interpret L n (6) as the likelihood of 9, rather than the probability of 
the sample. Indeed, L n (9) weights different values of 9 e © on the basis of the 
observed (and given) data. This allows us to define a general approach in the 
search of estimators of the unknown parameter 9 as we will see shortly. 

The likelihood function has several derived quantities of interest. The 
log-likelihood l n {9) = log L n (0) plays some role and it is such that, under some 
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regularity conditions, i.e. when the order of integration and derivation can be 
exchanged, we have that 


Ee 4(0) 


= Ee 


1 9 


L n (0) 90 


L„m 


-L I 


1 


9 


Ln (9 l*M ' • * • , X n ) 


l L n (0|*i, ... ,*„) 90 
x L„(0 \x\, , x n )dx\ ,... d.v„ 

f 9 

= / — L n (0|xi,...,x„)dYi,...d^„ 

Jr" 90 
9 r 

= — / Ln(6\x I-,x„)cki, ...dx„ 

dU J R n 


= —1 = 0. 

90 

The function ^£„(0) is called score function and the variance of the score 
function is called Fisher information 

T„(0) = E e |^4(0) 

Further, under the same regularity conditions, it is possible to show that the 
variance of the score function can be obtained by the second derivative (Flessian 
matrix in the multidimensional case) of the log-likelihood 


(2.28) 


1 9 

2 r a 2 I 


= ~^[^n(e) 


Indeed, by differentiating twice 1(9) we obtain 

9 2 1 9 2 

—l n (6) =- ? L„(0) - 

dO 2 L n (9)d9 2 


90 


4(0) 


hence, reorganizing the terms and taking the expectation, we obtain 

r- 

Ee 


\ 92 

1 9 


+ E 0 — 4(0) 


: E@ 


L 


1 9 2 

L„(0) 90 2 

1 


4.(0) 


/<■« L„(0|xl. ..., x„) 90 2 


Ln (9)L n (0 |x 1 ; ..., Xn )d.ri,... d.r„ 
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L n (6)dx\, ...dx„ 


which proves the fact. In analogy with l n (6), we can define the Fisher information 
for the single random variable (or for the model). For a random variable X, we 
can introduce 1(6) = log f(x\ 0) is X is continuous or 1(6) = log p(x\ 0) if X 
is discrete. Then the Fisher information is defined as follows: 


W) = E e 


a 

2 f a 2 

—1(0) 

lae 



For i.i.d. samples, the likelihood L n (6) factorizes in the product of the single 
densities (or probability mass functions) and hence, in formula (2.28), the log- 
likelihood i n (6) takes the form of the sum of the log-densities (log-probabilities) 
and then the Fisher information T n (9) can be rewritten as 

n 

i n ( 0 ) = ]Tt( 0 ) = me). 

i =1 

Why the quantities 1(0) and 1„(6) are called information will be clear in the 
next section. 

2.6.3 Efficiency of estimators 

Another way to compare estimators is to consider efficiency in terms of ratio of 
their variances. In particular for unbiased or consistent estimators, this approach 
is equivalent to consider comparison using the mean square error. The following 
general result states that under regularity conditions, the variance of any estimator 
of 0 in a statistical model satisfies this inequality. 

Theorem 2.6.7 (Cramer-Rao) 


Var e (T„) > 


(1 + A B ias e (r„)) 2 
1 ( 6 ) 


So if we restrict the attention to unbiased estimators we can reread the above 
result as follows: no matter which estimator one chooses, the best one can do is 
to obtain an estimator whose variance is not less that the inverse of the Fisher 
information. So the Fisher information, and necessarily the likelihood of a model, 
describe the maximal information which can be extracted from the data coming 
from that particular model. 
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2.6.4 Maximum likelihood estimation 

If we study the likelihood L n {9) as a function of 6 given the n numbers (X j = 
xi,... ,X n = x n ) and we find that this function has a maximum, we can use this 
maximum value as an estimate of 6. In general we define maximum likelihood 
estimator of 6 , and we abbreviate this with MLE, the following estimator 

9 n — argma xL„(0) 

0€0 

= argmaxL„(0|Z!, Z 2 ,..., X n ) 

0e0 


provided that the maximum exists. The estimator 9„ is a real estimator because 
in its definition it depends on the random vector (X\,..., X n ) and, of course, it 
does not depend on 9. 


Example 2.6.8 Let X n i = 1 ,... ,n,be an i.i.d. sample extracted from the Gaus¬ 
sian distribution N(/r, cr 2 ). For simplicity, assume cr 2 is known. We want to find 
the MLE of pt. Hence 


n 

L n (pt) = Y\ 

i =1 


1 (Xi-M) 2 

\Jlno 2 



Instead of maximizing L n (p.) we maximize the log-likelihood I n (pt) — log L n (fi) 


4(m) = n log 



(X t - pt) 2 


2cr 2 


but, maximizing l„(pt) is equivalent to minimizing —l n {pL). Moreover, the maxi¬ 
mum in pi does not depend on the first term ofi n (pt) which contains only constants, 
hence we just need to solve 


1 ' 

fin = argmin —-j V(^, - pf 2 
m 2o A L —^ 

i=i 

but this minimum is exactly X n — - ^" =1 Xj by the properties of the arithmetic 
mean 4 . Hence the ML estimator of pt is fi n = X„. 

Exercise 2.16 Consider the setup of Example 2.6.8. Find the maximum likelihood 
estimator of 9 = (pt, cr 2 ). 

Exercise 2.17 Let Xj, i — I, be i.i.d. random variables distributed as 
Bcrf p). Find the maximum likelihood estimator of p. 

The maximum likelihood estimators have the following properties under very 
mild conditions in the i.i.d. case. We do not state these conditions here because 


4 It is easy to show that: min n X)" =1 (x; — a) 2 = J)" =1 (x, — x„ ) 2 . 
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in what follows we will discuss more complicated settings, but we mention these 
properties because they are the typical result for ML estimators. 

(i) these estimators are usually biased but asymptotically unbiased and 
hence consistent; 

(ii) their limiting variance attain the Cramer-Rao bound for unbiased esti¬ 
mators, i.e. the limiting variance is the reciprocal (or inverse in the 
multidimensional case) of the Fisher information 1(0): 

(iii) they are asymptotically normal distributed, i.e. if T n is MLE of 0, then 

Vn(T„ -6>) 4 - N(0,I _1 (6i)); 

(iv) these estimators are also invariant to reparanretrization. For example, 
if rj = g(0) and g(-) is some transform, then the MLE for the new 
parameter rj can be obtained as f) n = g(0„) where 0„ is the MLE for 0. 


2 . 6.5 Moment type estimators 

Another approach to derive estimators is to use the method of moments (for a 
complete treatment see, e.g. Durham and Gallant (2002); Gallant and Tauchen 
(1996); Hall (2005).) The idea is to match the moments of the population with 
the empirical moments in order to get estimators of the parameters. In practice, 
the estimators are obtained after solving a system of equations like these 

1 " 

Hj=E e (XJ) = -J2x J i> j = h...,k 

” i= 1 


where k is the number of parameters to estimate. 

Example 2.6.9 Consider the setup of Example 2.6.8 with a 2 unknown, and let 
us find moment-type estimators of /x and a 2 . We know that a 2 — Vare(Z) = 
Ee(Z 2 ) — jx 2 — /xo — ix 2 , hence we need to set 


and then jx n = X n and a 2 


l A*2 =*£?=!*? 

lZUx?-(Xn ) 2 = i n T l U(Xi-Xn) 2 . 


2 . 6.6 Least squares method 

This method is one of the oldest in statistics and it is based on the minimization 
of the quadratic norm between some theoretical function of the random variable 
and the corresponding function on the observed data. A typical application is 
regression analysis but the general setup is to solve a minimization problem 
which involves the residuals in some model. Assume (A,, T ; ), i = \.... ,n, are 
the observations and assume some statistical relationship between the variables 


66 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


X and Y like Y = f (X: 9). Define the residuals as R, — Yj — f (X ,; 9), then the 
least squares method consists in finding the solution to 


n 


n 


On = argmin ^(/(X,-; 0) - Y t ) 2 


— argmin^ Rj. 


The residual Rj is assumed to be related to the error due to randomness in the 
sample. If /(■) is a linear function, the resulting estimator has several properties 
and can be obtained in an explicit form, in the nonlinear case some numerical 
method is needed to find the solution to this quadratic problem. 

2.6.7 Estimating functions 

Similarly to the least squares or moment-type methods one can consider 
estimating functions as some form of distance between the true parameter and 
the sample counterpart. Estimating functions are functions of both the data and 
the parameter, i.e. functions of the form H(X \,..., X n , 9) with the property 
that ~&q{H{X\, ..., X n , 9)} — 0 if 9 is the true value. The estimator is then 
obtained in implicit form as the solution to 


e n : H(X u ...,X n -J n ) = 0. 


For example, if we take as H(X i,..., X n ; 9) = minus the derivative of the 
log-likelihood function, i.e. 



the value of 6 which makes H(; 6) — 0 is nothing but the maximum likelihood 
estimator of 6 when the conditions for which the score function has zero mean 
are satisfied. 

2.6.8 Confidence intervals 

Along with point estimates produced by some estimator T„ it is sometimes 
convenient to get an indication of an interval of plausible values centered around 
the estimate which is likely (or supposed) to contain the unknown parame¬ 
ter 9. Although, for any deterministic interval [a, b] the probability of the event 
‘9 e [a, bf is either 1 or 0, it is still possible to obtain some information using 
both the mean and the variance of the estimator. In particular, if an estimator is 
unbiased (or consistent) and we know its variance, we can use a version of the 
central limit theorem 5 which gives as a result 


T n -Eg(T n ) d 


Z ~ N(0, 1) 


VVar 0 (r„) 


5 To be proved case by case. 
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and so we can approximate the distribution of T n for large n as 
T n ~ EgT n + ZVVar 0 (r„) ~ 9 + ZjVaig(T n ). 


In this case, the quantity 


Tn ~ 0 d 

VVar e (T„) 


N(0, 1) 


is the standardized distance of the estimator T n from the target unknown value 9. 
We can try to control this distance in probability in the following way 


Pe 



< 


T n ~ 0 
VVar 0 (T n ) 


— 1 — a 


where a e (0, 1) is interpreted as an error and z, q — P(Z < q) is the q- th quantile 
of the standard normal distribution. If we want to interpret this probability of ‘an 
interval for the random variable T n ’ in ‘an interval for the unknown 9’ we need 
to reorganize the writing as follows: 

Pe { (t„ - z°y/Var g (T n )> d) n (j n - Zl _- y/Vai e (T„) < e)} = 1 - o 
or 

Pe \r n - z^iVVareCr,,) <6 <T n - Z|v /Var e( 7 ’n)} = 1 - «• 

Now, remember that, by symmetry of the Gaussian distribution, we have 


z | = -Zl-%, 

thus, finally 

Pe { T n - zi_« VVareCr,,) <9 <T„ + zi-« VVareCr,,)} = 1 - a. 

The above interval is usually rewritten as 9 e [ T n ± zi-%*/Varg(Tnj] but when 
we see the writing 

P e j 9 e [r„ ± Zl _ f VVar e (T„)] j = 1 - a 

it is important to notice that 9 is a given and unknown fixed constant, but the 
extremes of the interval vary because T n varies. This is interpreted as (1 — a)% 
of the times, the above interval contains the true value 9 and the remaining a% 
of times produce intervals which do not contain the true value. 

In some cases, for example T n — X„, with known variance a 2 and Gaussian 
random sample, the above interval is exact, i.e. we don’t need the central limit 
theorem. In most cases, the interval is only approximate, so the interval should be 
regarded as an approximation. Moreover, the calculation of the interval depends 
on the variance of the estimator which usually needs to be estimated. In such 
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a case, a consistent estimator is needed to estimate Var^Tj,) in order to justify 
the approximation of the asymptotic distribution of 7j, and hence obtain the 
confidence interval. 

Example 2.6.10 Consider an i.i.d. sample from the Bernoulli distribution with 
parameter p. The MLE of p is the sample mean X n which we denote by p n (see 
Exercise 2.17). The estimator is unbiased = p because it is just the sample 

mean and using the central limit theorem we get: 


Pn 


P + yVar p Q3„)Z = p + 


P(l-P) 


Z; 


clearly we cannot calculate the approximate inter\’al as 


P e 


Pn ± Z 1-| 


P( 1 ~ P) 
n 


because p is unknown. Then we need to estimate the variance of p„. By the law 
of large numbers, the estimator p n is consistent and hence we can propose the 
following approximated interval 


P € 


Pn ± Zl-« 


Ai(! - Pn) 
n 


but this is only asymptotically of level a. Another conservative approach is to 
consider that f(x) = x(\ — x),forx € (0, 1), has its maximum at x — 0.5. Thus, 
we can use p — 0.5 to estimate the largest variance of p n and obtain the following 
confidence interval 


P € 



whose length is independent of the actual value of p n . 


For asymptotically efficient estimators (like the maximum likelihood estimator in 
most of the cases) the asymptotic variance coincides with the inverse of the Fisher 
information and it is possible to obtain it numerically as part of the estimation 
procedure. In particular, the mie function of R produces both estimates and 
confidence intervals based on this strategy. 


2.6.9 Numerical maximization of the likelihood 

It is not always the case, that maximum likelihood estimators can be obtained 
in explicit form. For what concerns applications to real data, it is important 
to know if mathematical results about optimality of MLE estimators exist and 
then find the estimators numerically. R offers a prebuilt generic function called 
mie in the package stats4 which can be used to maximize a likelihood. The 
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mie function actually minimizes the negative log-likelihood —1(6) as a function 
of the parameter 9. For example, consider a sample of n — 1000 observations 
from a Gaussian law with N(/i — 5, a 2 = 4), and let us estimate the parameters 
numerically: 

R> set.seed(123) 

R> library("stats4") 

R> x <- rnorm(1000, mean = 5, sd = 2) 

R> log.lik <- function(mu = 1, sigma = 1) -sum(dnorm(x, mean - mu, 
+ sd = sigma, log = TRUE)) 

R> fit <- mleflog.lik, lower = c(0, 0), method = "L-BFGS-B") 

R> fit 


Call: 

mle(minuslogl = log.lik, method = "L-BFGS-B", lower = c(0, 0)) 

Coefficients: 

mu sigma 
5.032256 1.982398 

and, using explicit estimators for /i and a 2 we get: 

R> mean (x) 

[1] 5.032256 
R> sd(x) 

[1] 1.98339 

which almost coincides numerically (remember from Exercise 2.16 that the MLE 
estimator of a 2 is S 2 while sd calculates .S',;). What is worth knowing is that the 
output of the mle function is an object which contains several informations, 
including the value of 1(6) at the point of its maximum 

R> logLik(fit) 

' log Lik. 1 -2103.246 (df=2) 

the variance-covariance matrix of the estimators, which is obtained inverting the 
Hessian matrix at the point 6 corresponding to the maximum likelihood estimate 
assuming that, under the regularity conditions for the data generating model, 
(2.28) holds and that MLE are asymptotically efficient 

R> vcov(fit) 


mu sigma 
mu 3.929901e-03 8.067853e-10 
sigma 8.067853e-10 1.964946e-03 
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Similarly, approximate confidence intervals and the complete summary of the 
estimated parameters can be obtained using respectively the functions confint 
and summary: 

R> confint(fit) 

Profiling... 


2.5 % 97.5 % 


mu 4.909269 5.155242 
sigma 1.898595 2.072562 

R> summary(fit) 

Maximum likelihood estimation 

Call: 

mle(minuslogl = log.lik, method = "L-BFGS-B", lower = c(0, 0)) 

Coefficients: 

Estimate Std. Error 
mu 5.032256 0.06268893 
sigma 1.982398 0.04432771 

-2 log L: 4206.492 

In our example above, we have specified the option method, which tells R 
to maximize the likelihood when parameters are subject to constraints. In our 
example, we specified a vector of lower bounds for the two parameters using the 
argument lower. We will return to the general problem of function minimization 
and numerical optimization in Appendix A. 

2.6.10 The 8 -method 

The so-called 5-method is a technique to derive the approximate distribution of 
an estimator which is a function of another consistent estimator. Assume that T n 
is a consistent estimator for 0 such that 


sfTi(T n -0)4n(O,ct 2 ). 


Now let g(6) be a differentiable and non-null function of 9. Then 



(2.29) 


We now prove this result under the assumption that g(-) is a continuous function. 
Proof. Using Taylor expansion, we can write 


g(T n ) = g{9) + y—g(0)(T n - 9) 
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for some 9 between T n and 9. Since T n is a consistent estimator of 9 then also 

~ P 

9 —>■ 9. Since g(-) is continuous, by continuously mapping Theorem 2.4.10 we 
also have that 

Te*® - r 9 m. 


Therefore, 


Mg(T n ) - g(9)) = —JTig(0)(T n - 9). 

ou 


By assumption — 0)N(0, cr 2 ), hence applying Slutsky’s Theorem 

2.4.9, we end up with the result (2.29). 


2.7 Solution to exercises 


Solution 2.1 (Exercise 2.1) Notice that by axiom (ii) 1 = P(£2). By definition 
of complementary set, we have that £2 = A U A because A and A are disjoint 
sets. Then, by axiom (iii) 1 = P(Q.) — P(A U A) — P(A) + P(A). By the same 
decomposition P(A) < 1 because P(A) > 0 by axiom (i). 

Solution 2.2 (Exercise 2.2) We only prove (2.1) and (2.2) because the 
proof of sub-additivity requires additional preliminary results. Notice that 
AUB = AU(Anfi) and then P(A U B) — P(A) + P(A fl B). Further, B = 
(A fl B) U (i n B), hence P(B) = P(A n B) + P(A n B) = P(A (T B) + 
P(A U B) — P(A), which proves (2.1). We now observe that A C B implies 
AH B — A, then P(B) = P(B (T SI) = P(B (T (A U A)) = P(B (T A) + 
P(B HA) = P(A) + P(B 0 A), so P(B ) > P(A) because P(B flA)>0. 


Solution 2.3 (Exercise 2.3) We need to show (i) to (iii) of Definition 2.1.2. Prop¬ 
erty (i) is trivial because for all A, Pb(A) — P(A\B) is a ratio of a non-negative 
quantity P(A H B) and a positive quantity P(B). For (ii) we have that Pb(Q) — 
P(Q nB)/P(B) = P(B)/P(B) = 1; For (iii) 


/> b (U ; -A,0 = P(U i A,\B) = 


P((u,-A,-)ng) 

P(B) 


P(u,-(A,-ng)) 

P(B) 


1 

P(B) 


P(Aj nB) = J2 P(Ai\B) = P B(Ai). 

i i i 


Solution 2.4 (Exercise 2.4) Given that F H Q — F and that Q — 1J" =1 A, we 
can rewrite E — E fl ((J” =1 A, ). Applying distributional properties of fl and U 
operators, we obtain 

n \ n 

U A') — LJ(^ n Ai). 

i=i / i=i 


e — e n 
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Due to the fact that the sets A, in the partition are disjoint, so are the events 
E D A,. Finally, we notice that P(E n A, ) = P (E \ Ai ) P (A,-) from Definition 2.1.7 
and the proof is complete. 

Solution 2.5 (Exercise 2.5) If X and Y are independent we have that 
E(Zy) = E(Z)E(y), hence Cov(Z, Y) = E (XY) - E(Z)E(y) = 0. We now 
derive a counter example. Let X be a continuous random variable defined 
in (—1, 1) with density f(x) — j. Define Y — X 2 . We now calculate the 
covariance. Cov(Z, Y) = E (XY) - E(Z)E(y) = E(Z 3 ) - E(Z)E(Z 2 ) = 
\ f\ x 3 dx - 0 ■ E(Z 2 ) = 0. 

Solution 2.6 (Exercise 2.6) E(Z) = 1 • P(X = 1) + 0 • P(X = 0) = p and 
Var(Z) = E{X - E(Z)} 2 = (1 - p) 2 ■ p + (0 - p) 2 • (1 - p) = (1 - p)p{( 1 - 
p) + p) = p( 1 — p). Finally, (p(t) = E{e !fX } = e lt p + e°(l — p). 

Solution 2.7 (Exercise 2.7) Noticing that the Binomial random variable can be 
represented as Y — X , where the Z,- are i.i.d. Ber(p), applying (2.5), (2.6) 
and (2.7), one obtains respectively mean, variance and characteristic function 
of Y. 

Solution 2.8 (Exercise 2.8) 


00 00 yk 00 lifc-l 

E(Z) = V k- -= e“ A Y' k— = V- 

t—* k\ LI ^(jfc-1)! 


k =0 


k=\ 


k= 1 


V 


= ^Ett = ^ 


“V. 


7=0 


J! 


00 


E(z 2 ) = k2 —jfr = e ~ Xk k 


X 


it—I 


k =0 


k= 1 


(k — 1)! 


t it— i 00 it-1 

Y(k - i)—-+ V- 

^ (*-!)! 1)! 


L *=I 


= e~ A X 


-x, f E (Z) 




a-X 


+ e 


= e~ k X 


o-x 


+ e A 


— X 2 + 7 


Hence Var(Z) = E(Z“) — {E(Z)}~ = X. Finally 

\k 


°° ,X k e~ k 


fit) = J2 e " k -7fi- = e T~ = e 1 ex p{Ze' r } = exp {A, (e ,r — l)}. 


k =o 


k\ 


k=0 


k\ 


Solution 2.9 (Exercise 2.9) This is a simple exercise of calculus. E(Z) = 


fa = ~>(b-a) — ( a b)/2. We now calculate E(Z 2 ) = j 
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3(6—cr) 


= therefore Var(X) = E(X Z ) - (E(X)} 2 


(b-a) 


^)- Finally, 0(f) = /* f^dx = 


12(6 




it(b—a ) 


6 3 —a 3 (i+fl) 2 _ 

3(6—a) 4 ~ 


Solution 2.10 (Exercise 2.10) 


/’OO POO J /*oo 

E(X)= / xXe~ kx dx = - I x—e- Xx = -xe~ Xx C+ e~ Xx dx 
Jo Jo dx Jo 


Similarly 


E(X 2 ) = 



x 2 Xe /x dx = 2 




2 P 00 d _ Ijc 2 

- / x —e = — r. 

A, Jo d.v A- 


Therefore, Var(X) = E(X 2 ) — (E(Z)} 2 = A~ 2 . For the characteristic function we 
remember that, by Euler’s formula e lx = cos(x) + i sin(x), then e ,x is always 
limited, hence 


r°° I „ i 

0(f) = / e" x Ae -; ' j: djc = -- 

Jo it — A A — if 

Solution 2.11 (Exercise 2.11) Notice that g|/(x) = ^rffx), hence 

E(X) = [ x/(x)dx — —a 2 f — - j ——/(x)dx = —<r 2 f — f(x)dx + /x 

am Jr v ' Jr dx ' 

= -c 2 /(x)!!^ + /X = /X. 


Similarly, for the variance we notice that ^ f (x ) = ^ ^ -/(x), f/zen 


E(X - /X)' 


= / 

= ° 4 j 

AM 


(x - /x)-/(x)dx 

d 2 


= ff4 / 
AM 


(/X - X)“ 


O' 4 


er- 


/(x)dx + cr' 


dx 2 


/(x)dx + a 


= cr 4 — f{x) 
dx 


+ o 2 = CT 2 (x - /x)/(x)|“ 0o + O - 2 = O ' 2 


Further, for the characteristic function we have 

(x — /x) 2 — 2/fxcr 2 


E 


f e " x ! = / 7=^ ex P 

AM s/ZTZO- 


— exp 


= exp ■ 


7i a 

( a 2 it + /x) 2 — /x 2 
2 cr 2 

(cx 2 i'f + /x) 2 — /x 2 
2cr 2 


2cr 2 


f 1 

/ , exp 

Am \f2na 2 


dx 

(x - (/x + a 2 it)) 2 


2a 2 


dx 


= exp I /xx f 


cr 2 f 2 
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Solution 2.12 (Exercise 2.12) We apply (2.15) with the exponential densities 
fx(x ) = /y(x) = \e kx . We only consider the case z > 0 because fz(z) — 0 for 
z < 0 . 

fzAz) = [ fx(z - y)fr(y)dy = [ Xe~ Hz ~ y) Xe~ ky dy 


— f Ire A: dy = X 2 ze Xz 

Jo 


Solution 2.13 (Exercise 2.13) We use the convolution formula (2.15) only for 
the particular case pt\ — pti — 0 and cr 2 — cry — 1 . 


fz(z) 


-/ 

Jr 


(z-y) 


e~^r i r .2 2 , 

dy = — I e 2 y +zy d y = 
27T Jr 


V27F V2?r 


_z 2 

e 4 
2n 


f e ( y 2 ) dy 

Jr 


z 2 

2^^ 


[/> 

_Jm V 7r 


e -(y-§) 2 dy = 1 


where the integral in the brackets evaluates to one because it is the integral of 
the density of a N 5 ) and 


fziz) = 


_ Z 2 

e 2 ^ 
y/Tjtl' 


hence Z ~ N(0, 2). 


For the general case, instead of the convolution formula (2.15) we use the char¬ 
acteristic function 


2,2 


oft- 


2 2 

<Pz(t) = <px(t)(p Y (t) = e Ml " 


(CTf + cr 2 2 )? 2 


= exp | (/xi + n 2 )it - 

Therefore, Z ~ N(/zi + 712 , o' 2 + cr 2 ). 

Solution 2.14 (Exercise 2.14) We Ziove ///a/ 


<Px(«) = exp{L (e ,u - 1)} = (exp (e' u - l) J = (<p x im(u))" , 
with X (1 /"> ~ Poi (£). 

Solution 2.15 (Exercise 2.15) Clearly E e(T n ) = 2 ^" =1 {a ( E 6 i(A r ,)} = | £]” =l 
a, = 6. Now we look for the ai such that we get the minimal variance of T n . 
So we need to search for the minimum of 

1 ” 

Var e (7;) = — y> ; 2 Var 0 (X ( -) 
n l L —' 
i =1 
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which corresponds to minimizing the quantity ff' i= 1 under the constraint 
a i — 11 • We make use of Lagrange multipliers, therefore we construct the 
function 

» /■ \ 

/(A, a u ..., a n ) = X a} - 7. 1 X a, - « I 

i=l \i=l / 

a/ie? calculate the derivatives of /(A, a,) with respect to all variables, i.e. 


— f(X,a u ... ,a n ) = 2a, - A = 0, 
daj 


3A 


/(A, ai,..., a„) = X at — n — 0. 


i=i 


77;e« we sum up all the equations involving the a, ’s and obtain 2 a > ~ n ^ 
— 0 from which in turn we have A = 2. Now put back this A in 2a, — A = 0 and 
obtain the desired result a, = 1. 

Solution 2.16 (Exercise 2.16) We minimize minus the log-likelihood function as 
a function of p and a 


i n n o x—' (Xj — p) 2 

h(p,o 2 ) = -l n {p,o 2 ) = -log(2 7r) + -logo- + X' 


i=i 


2cr 2 


3 

3/z 

3 

3er 2 


1 " 

hip, o 2 ) — -j ~ = 0 


hip, a") = 


o 

i=i 

« 1 
2cr 2 2er 4 


X> - /x) 2 = 0 


1 = 1 


From the first equation we get p = X n and pluggin-in this value into the second 
equation we obtain o 2 — S 2 — - IX\ — X n ) 2 . We now need to verify that 
at least one of the two second derivatives at point ip, a 2 ) is positive and the 
determinant of the Hessian matrix of second-order partied derivatives of hip, o 2 ) 
evaluated at the point ip, <x 2 ) is positive. So we calculate partial derivatives first. 


3“ , n 

hip, o~) = 


dp 


o- 


-lxip, cr) 


3 (er 2 ) 2 ’ 2cr 4 a 


i n 

VI 1 X—> 9 

= -^ + 3 


i=i 


3 2 j n 

hip, a 2 ) = H—j X (Z ' “ ^ 


3 po 


1 = 1 
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Now we recall that a 1 — - ]P" =1 (X; — pi) 2 and ^" =1 (X,- — pt) = 0, /lerace 
d 2 


dp 2 

d 2 

d(o 2 ) 2 


dpcr 


/i(/X, cr ) 
h(/jL, cr 2 ) 
;h(p, cr 2 ) 


n 

= TT >0, 

At = 1 a, 0 - 2 =< 3.2 cr z 

n n n 

„=p.,a 2 =6 2 2<T 4 + <7 4 2ff 4 > °’ 


/v 0 a 7 (T^ *■ 

H=li,cr z =cr z u i=1 


1 72 

= — £(*,- - A) = 0. 


Finally, we calculate the determinant of the Hessian matrix evaluated at point 
{pi, a 2 ) to check if it is positive 


H(pt,o 2 ) 


a F 

a 2 

d/icr 

n 

<7 2 

0 

1 n 2 

2 <7 6 


-h{pi, a 2 ) 
-Mil, or 2 ) 

0 

n 

2 a 4 


>o. 


j^ h ^,c-) 


2 h(fi, cs L ) 


3 (<r 2 ) 


Solution 2.17 (Exercise 2.17) For each observation we can write P(X — x,) = 
p x ‘{ 1 — p) l ~ Xi hence 


L n (p) = p x '{ 1 - p) 1 "*' = p£?= i x «(l - p)" _ £*=i*' 
f=i 


hence 


and 


n 

Zn(p) = X ‘ l0g P + 

i =1 



log(l - p) 


9p P 


- e;u 

i-p 


from which we obtain 


\_-p_ 

P 


e:=i * 


-1 


i - p = 


ft/? 


E"=i 


- 7- 


P — X n 
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2.8 Bibliographical notes 

One of the basic books in probability to study the subject in more detail is, 
for example, Billingsley (1986). This book also includes a complete treatment 
on conditional expectation. For an updated account on several probability dis¬ 
tribution, one should not miss Johnson et al. (1994, 1995). Two textbooks on 
statistical inference is the classical reference Mood et al. (1974) and Casella and 
Berger (2001). 
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3 


Stochastic processes 


When there is the need to model some form of dependence in a sample of 
observations, stochastic processes arise quite naturally. Dependence is a generic 
term which must be specified case by case in statistics and probability, but the 
most common situations include time and/or spatial dependence. In our context, 
the most interesting form of dependence is time dependence. Still, time depen¬ 
dency can be modelled in a variety of forms as we will discuss in this chapter. 

3.1 Definition and first properties 

We assume to have a probability space {SI, A, P). A real-valued, one¬ 
dimensional, stochastic process is a family of random variables { X y , y e T} 
defined on SI x T taking values in R. The set T may be any abstract set, but 
we will restrict our attention to particular cases. For each y e T, the random 
variable X(y, co) is a measurable map X (y, co) M. For a given bxed value 
of co, say d>, the map X(y, d>), seen as a function of y e T, represents one 
evolution of the process and the set 

{X(y,cb),ye T} 

is called trajectory of the process. In what follows, to keep the notation compact, 
we will adopt the notation X Y — X y (co) — X(y, co) = X{y) whenever needed. 
We will now consider more concrete cases of stochastic processes. 

Example 3.1.1 IfY — N and the X n , n e N, are independent and identically dis¬ 
tributed, the process {X n , n e N} represents an i.i.d. sample. This sequence is a 
discrete time process, usually called Bernoulli or simply random sample and, as 
seen in Chapter 2, it is the basis of elementary statistical inference. 
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Example 3.1.2 IfT is the axis of positive times [0, oo), then { X,, t > 0[ is called 
continuous time process and each trajectory represents the evolution in time ofX. 
This is the usual case of financial time series. 

Consider for a while T = [0, oo). Each value of a> e Q generates a trajectory 
X(t, &>) as a function of t e T. Each of these trajectories is called path of the 
process. In finance the dynamic of asset prices or their returns are modeled via 
some kind of stochastic process. For example, the observation of a sequence of 
quotations of some asset is the one and only one (statistical) observation or path 
from such models. This means that, unlike the i.i.d. case in which it is assumed 
to have n replications of the same random variable, in finance we have only 
one observation for the model which consists in the whole observed path of 
the process. 

For a given t, the distribution of X[(o>) as a function of a> e Q is called 
finite dimensional distribution of the process. Clearly, one of the most interesting 
questions in finance is how to predict future values of a process, say at time t + h, 
h > 0, given the information (observation) at time t (see Figure 3.1.) In order to 
provide an answer, a correct specification of the statistical model is required, its 
calibration from the past and actual data and a proper simulation or prediction 
scheme for the chosen model. 

Processes are also classified according to their state space. The state space of 
the process is the set of values assumed by the process X. If the process takes a 
finite (or countable) number of values (states), the process is called a discrete state 
space process, otherwise the process has continuous state space. Combinations 
of state space and time space identify different classes of processes. Table 3.1 
reports some classes of processes. 



Figure 3.1 The graph represents three different trajectories for three different 
values ofcbi, d> 2 , (h^for t > 0. For a given value oft, say t, the set {X r -(&>), 00 e £2} 
represents the set of values of the process X at time t, from which the finite 
dimensioned distribution of Xis obtained. 
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Table 3.1 Example of processes in terms of state space and time. 


Time 

Discrete 

State space 

Continuous 

Discrete 

Random walk 

Markov Chain 


ARIMA, GARCH 

Continuous 

Counting, Poisson 


Telegraph, Levy 
Wiener, Diffusion 


3.1.1 Measurability and filiations 

Consider a stochastic process {X(t), t > 0}. At each time t, it is possible to 
associate to X, a a -algebra denoted by T, — a(X(s)\ 0 < .v < t) (the a -algebra 
generated by X up to time t ), which is the smallest a -algebra which makes 
X{s, co) measurable for all 0 < s < t. This a -algebra is the smallest set of subsets 
of ST which allows to evaluate probabilities of to events related to X(t). More 
precisely, we can write 

ar(X(s); 0 <s <t) = a(X~ l (B), B e B(R); 0 <s<t). 

Definition 3.1.3 Let Ao — [0, ST} be the trivial a-algebra. A family A — {At, 
t > 0} of sub ct-algebras A t C A, such that Ao C A s C At, for 0 < s < t, is 
called filtration. 

The filtration {T t ,t> 0}, with T t — a(X(s); 0 < s < t), is called natural 
filtration of the process X t . Clearly, X, is T t -measurable for each t > 0, but 
in general stochastic processes can be measurable with respect to generic 
filtrations. 

Definition 3.1.4 Let {At, t >0} be a filtration, A process ( X,, t > 0} is said to 
be adapted to the filtration {A t , t > 0} if for each t the random variable X t is 
A t -measurable. 

Similarly, for N-indexed stochastic processes, we can construct filtrations 
{ T n , n e N} and natural filtrations, but the main point here is that filtrations are 
increasing sequences of sub rr-algcbras. 

Example 3.1.5 (See Billingsley (1986), Th 5.2) Consider a binary experiment, 
like throwing a coin. We can associate this experiment with the a-algebra built 
on £2 = [0, 1] in the following way: the random variable X\{co) takes value 0 if 
a) — [0, 1/2) and 1 ifco — [1/2, 1], i.e. Xi(co) — 0 ifco e [0, 1/2) and Xi(co) — 1 
if co e [1/2, 1] (see bottom of Figure 3.2). The probability measure is the Lebesgue 
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X 2 = 0 


X 2 =1 


X 2 = 0 


Xo= 1 


X, = 0 


Xi = 1 


Figure 3.2 Example of building of a filtration. See text of Example 3.1.5. 


measure of the corresponding interval, i.e. the length of the interval. Hence P(X = 
k) = P({co e £2 : X(co ) = k}) = p({co e £2 : X(co) = k\). Hence we have 

P(Xi = 0) = P({co e £2 : X(co) = 0}) = p 

and similarly P(X \ = 1) = 1/2. The following cr-algebra 

T\ = { 0 , [ 0 , 1 / 2 ), [ 1 / 2 , 1 ], [ 0 , 1 ]}, 

makes X i measurable. We now define X 2 in the following way: Xi ( co ) — 0 if 
co e [0, 1/4) U [1/2, 3/4) and X 2 (co) = 1 if co e [1/4, 1/2) U [3/4, 11. Hence 

P(X 2 = 0) = /r([0, 1/4) U [1/2, 3/4)) = 1/4 + 1/4 = 1/2 

and similarly P(X 2 = 1) = 1/2. In this case, the o-algebra which makes X 2 
measurable is the following 

T 2 ={0, [0, 1/2), [1/2, 1], [0, 1/4), [1/4, 1/2), 

[1/2, 3/4), [3/4, 1], [0, 1],...} 

(where by we mean cdl possible unions of the inter\>als listed). Of course 
we have T\ C T 2 - Now T 2 makes both X 2 and X\ measurable, while T\ makes 
measurable X\ but not X 2 . In fact, there is no set in T\ which makes measurable 
the event X 2 — 1. Similarly one can proceed with Xj. Thinking of the n-th throw¬ 
ing of the coin, we will end with subdividing the interval [0, 1] in subintervals of 
length 1/2" and obtain the 0 -algebra T n which includes cdl previous ones. Hence 
{J~i, i > 1} is a filtration and the process [Xj,i > n j is adapted to it. 

Filtrations are then a way to describe the increasing of information as time pass 
by. In finance a filtration represents all the information available on the process 
up to time t. Asking measurability to a process means that for that process it is 
possible to evaluate probabilities at any given time instant. 
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Definition 3.1.6 A stochastic process { X n , n > 1) is said to be predictable with 
respect to the filtration n > 1} if Xq € To and is lF n -measurable. 

Hence, for a predictable process, the knowledge of T n is sufficient to describe 
the process at time n + 1. 


3.1.2 Simple and quadratic variation of a process 

The notion of total variation or first order variation of a process {X,,t > 0} 
is linked to the differentiability of its paths seen as a function of t. Let n„ = 
IT,, ([(), r]) = {0 = to < t\ < ■ ■ ■ < U < • ■ • < t n = t] be any partition of the inter¬ 
val [0, r] into n intervals and denote by 


l|n„||= max {tj+i-tj) 

7=0,.. .,n — l 

the maximal step size of the partition n„, i.e. the mesh of the partition. The first 
order variation of X is defined as 

n -1 

V,(X) = p- lim V \X(t k+l ) - X(t k ) | . 
lin„lK0f-' 
k=() 

If X is differentiable, then V t (X) = j' {) |Z'(n)|dn. If V,(X) < oo, then X is said 
to be of bounded variation on [0, t]. If this is true for all t > 0, then X is said to 
have bounded variation. The quadratic variation \ X. X ], at time 1 of a process 
X is defined as 

n— 1 

[X, X], = p - lim V |X(4 +1 ) - X(t k ) | 2 . 

lin„lK0 j—f 
k= 0 

The limit exists for stochastic processes with continuous paths. In this case, 
the notation <X , X>, is usually adopted. The quadratic variation can also be 
introduced as 

2" 

[X, X] f — p — lim ^ (X, A i/ 2 « — X t A(k-i )/ 2 «) , 

«->oo r — 1 
k= 1 

where a A b — min(a, b). If a process X is differentiable, then it has quadratic 
variation equal to zero. Moreover, total and quadratic variation are related by the 
following inequality 


n — 1 


n —1 


\*(tk+i) - xit k ) i > 

k=0 k=0 


\X(t k+1 ) - X(t k )\ 

max |X(R. +1 ) - X(t k ) | 

n„ 


n — 1 


E 

k=0 


\X(t k+1 ) - X(t k )\ 2 


max |X(4 +1 ) - X(t k )\ ' 


(3.1) 
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Therefore, if X is continuous and has finite quadratic variation, then its first 
order variation is necessarily infinite. Note that V,(X) and | A. A], are stochastic 
processes as well. 

3.1.3 Moments, covariance, and increments of stochastic 
processes 

The expected value and variance of a stochastic process are defined as 

E(Xf) = [ X(t,co)dP(co), t e [0, T], 

Jo. 

and 

Var(X r ) = E{X, - E(X,)} 2 , t e [0, T], 

The A-th moment of X t , A > I, is defined, for all t e [0, T ], as E(Xf}. These 
quantities are well-defined when the corresponding integrals are finite. The 
covariance function of the process for two time instants s and t is defined as 

Cov(X s , X t ) = E {(X s - E(X,))(X f - E(X f ))}. 

The quantity X t — X s is called the increment of the process from s to t, s < t. 

These quantities are useful in the description of stochastic processes that are 
usually introduced to model evolution subject to some stochastic shocks. There 
are different ways to introduce processes based on the characteristics one wants 
to model. A couple of the most commonly used approaches are the modeling of 
increments and/or the choice of the covariance function. 


3.2 Martingales 

Definition 3.2.1 Given a probability space (Q.IF. P) and a filtration \T t , i > 0| 
on T, a martingale is a stochastic process {X f , t > 0} such that 

(i) E | Xf | < oo for all t > 0 

(ii) it is adapted to a filtration {!F t , t > 0} 

(iii) for each 0 < s < t < oo, it holds true that 

nxt\F s ] - x„ 

i.e. X s is the best predictor of X t given J- s . 

If in the definition above the equality *=’ is replaced by “>’, the process is 
called submartingale, and if it is replaced by “<’, it is called supermartingale. 
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From the properties of the expected value operator it follows that if X is a 
martingale, then 

E(X 5 ) = (by dehnition of martingale) = E(E{X ( |.F S }) 

= (by measurability of X, w.r.t. T s and (2.26)) = E(X,), 

which means that martingales have a constant mean for all t > 0. 

Again, a similar notion can be given for N-indexed stochastic processes. In 
this case, the martingale property is written as E{X„| !F n -\} = X„_i. We present 
now some discrete time processes for which it is easy to show their martingale 
property. 

3.2.1 Examples of martingales 

Example 3.2.2 (The random walk) Let Xi, X 2 ,..., X n , be a sequence of 
independent random variables such that E(X„) = 0 and E|X„| < oo, for all 
n e N. Consider the random walk defined as 

n 

S n = J2 X >’ n>\, 

i= 1 

and define T n — cr(X;; i < n). Then S„ is clearly J- n -measurable. Moreover, 

n 

IE I *S'„ I < ^E|X,-| < oo. 

i =1 

Finally, we check the martingale property (iii): 

^{SnlFn-l} — E{X„ + b'ji-ilJ 7 ,,-!} = E{X„|^„_i} + Ef^n—1 \J~ > 1 —l} 

= E(X„) + S n -\ = S n -\. 

Clearly, the assumption E(X„) = 0 is crucial to verify (iii). 

Example 3.2.3 (The likelihood ratio process) Consider a random sample of 
i.i.d. random variables Xj, X 2 , ..., X„ distributed as X which has density f(x). 
Denote the joint density of X\. Xo, ..., X„ by 


MXi,x 2 ,..., x„) = /(Xj)/(x 2 ) ■ ■ ■ f(x n ) 


and consider another distribution (i.e. a different statistical model) for X whose 
density is denoted by g(x). Under the second model, we can consider the joint 
density g n (■ ■ ■) defined similarly to /„(■ ■ ■)■ Notice that the assumption is still that 
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Xj are actually distributed with density /(■) and not g(-). Consider the stochastic 
process defined as 


L n {X\, ..., X n ) = 


gn{X l ,X 2 ,...,X n ) 

f n {x l ,x 2 ,...,x n y 


Then {L n , n > 1} is a martingale with respect to the filtration {(F n , n > 1} gen¬ 
erated by the Xj, i.e. T n — a(Xj\i < n). Clearly, L n is the likelihood ratio 
process (although we don’t mind about the parameter 6 in this notation) and it is 
T n -measurable. We now prove the martingale property 


E {LfFn-y — E 


g n (X u X 2 ,...,X n ) 
f n (X 1 ,X 2 ,...,X n ) 


[ g n . l (X l ,X 2 ,...,X n _ l )g(X n ) 

\f n - 1 (X l ,X 2 ,...,X n _ l )f(X n ) 


\ g{X ,,) 

1 

l f(x„) 



= L n _ lE 


( g(X„) \ 

\f(X n )J 


= L 


n-i 


by independence of X n with respect to T n -\ and because 


E 


8(X„) 

f(X n ) 


f 

Jr 


g(x„) 

f{x n ) 


f(x n )dx n 


-f 

Jr 


g(x n ) dx n = 1. 


The latter also proves that each L n is such that E|L„ | = 1 < oo because /(■) and 
g{-) are densities (and hence non-negative). 


Exercise 3.1 Consider the random walk {S„,n > 1}. Prove that {Z„ = S„\, 
n > 1} is a sub-martingale. 


Exercise 3.2 Consider the random walk {S n , n > 1} and the moment generating 
function of Xj M(a) — E \e aX '}. Let 

Z„ = M(a)~ n exp{a5„}, n > 1. 


Prove that {Z„,n > 1} is a martingale with respect to {T„, n > 1}. Z„ is called 
the exponential martingale. 

Exercise 3.3 Let Xj, X 2 ,..., X n , be a sequence of i.i.d. random variables with 
mean E(X, ) = 0 and E(X ( 2 ) = cr 2 < oo for all i — 1 Prove that 

= -na 2 , n> 1, 

is a martingale with respect to the filtration T n — cr(Xj, i < n). 
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Theorem 3.2.4 (Doob-Meyer decomposition) Let { X „, n > 1 } be a stochastic 
process adapted to the filtration {J- n ,n > 1} and such that E|X„| < oo for all 
n > 1. Then, the following decomposition exists and it is unique: 

X n — M n + P n , n > I, 

where { M„,n > 1} and {P„,n > 1} are, respectively, a martingale and a 
predictable process with respect to J- n , with Pq = 0. P n is also ccdled the 
compensator. 

Proof. We start proving that the decomposition is unique. Assume that two 
decompositions exist 

X n = M n + P n and X n = M* + P*. 

Let us introduce the predictable process Y n = M n — M* — P* — P n which is also 
a martingale. Then 

Y n+ 1 - Y n = E{T„ +1 |^,} - E{Y„\F„} = E{Y n+] - Y n \F n ] = 0 


from which we obtain that Y n+ \ = Y n — F„_i =■■■ = Y\ — Y< } but To = P* } — 
P Q = 0. Therefore, M* — M n and P* — P n . The existence, is proved by direct 
construction of the decomposition. Let us start with Mq — X {) and 

M n+l = M n + X n+l - E{X„ +1 |J-„}, and hence P n = X n - M n . 


First we prove that M n is a martingale. 


E{M n+1 \T n } = E{M n \T„}+E{X n+l \T n } - E{X n+l \F n ] = M n , 


and of course E|M„| < oo and M n is ^-measurable. Now, let us check that P n 
is a predictable compensator. Clearly, with Pq — 0. Moreover, 

P„ = X n - M n = X n - (M„_! + X n - E{A„| = — M„_! + nX*\Fn-i} 

is J ~ n _ i -measurable and hence predictable. 

The next couple of properties, whose proof can be found in Section 7 of 
Klebaner (2005), are useful to describe martingales. 

Theorem 3.2.5 Let M, be a martingale with finite second moments, i.e. 
E(M< oo for all t. Then, its quadratic variation exists and M? — [M, M] t is a 
martingale. 

Theorem 3.2.6 Let M, be a martingale with Mq — 0. If for some t, M, is not 
zero, then [M, M] t > 0. Conversely, if [M, M] t = 0, then M s = 0 almost surely 
for all s < t. 
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3.2.2 Inequalities for martingales 

We present here some inequalities for martingales without proof, which may be 
useful in what follows. 

Theorem 3.2.7 (Doob’s maximal inequality) Assume that {X n ,n > 1} is a 
non-negative submartingale. Then, for any X > 0, we have 



and if X n is a non-negative supermartingale 

P (supX* >X) < ^-E{Z 0 }. 

\k<n ) X 

For a continuous-time submartingale with continuous paths 

P ( sup X, > X ) < -E {maxlZj-, 0)}. (3.3) 

\t<T J ^ 

Theorem 3.2.8 (Doob’s maximal L 2 inequality) Assume that { X „, n > I [ is a 

non-negative submartingale. Then, 



(3.4) 


The L 2 inequality is a particular case of the more general L p inequality which 


requires a bit more regularity on the integrability of the process. We present it for 
continuous-time martingales. 

Theorem 3.2.9 (Doob’s maximal L p inequality) Assume that {X,, t > 0} is a 

non-negative submartingale with continuous paths. Then, 



(3.5) 


provided that K\Xj-\ p < oo. 

Theorem 3.2.10 (Hajek-Renyi inequality) Let { .S'„ — Xj,n > 1} be a 
martingale with E(5^) < oo. Let {b n , n > 1} a positive nondecreasing sequence. 
Then, for any X > 0 



(3.6) 
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For the particular sequence [b n — 1, n > 1}, the Hajek-Renyi inequality par¬ 
ticularizes to the Kolmogorov inequality for martingales, which is interesting 
in itself. 

Theorem 3.2.11 (Kolmogorov inequality for martingales) Let {S„ = 

Xj , n > 1} be a martingale with E(5^) < oo. Then, for any X > 0, we have 

P (max \Sj\>X^ < ^E (S 2 n ). (3.7) 

Theorem 3.2.12 Let {X t , t > 0} be a martingale, then for p > 1, we have 

P (sup|X s | > x] < -^-supE(|Z i | p ). (3.8) 

\S<I / XV s<t 


Theorem 3.2.13 (Burkholder-Davis-Gundy inequality) Let {X t ,t >0} be a 
martingale, null at zero. Then, there exist constants c p and C p , depending only 
on p, such that for 1 < p < oo 


c p E ( [X, ) < E 


sup \X t \ p 1 < C„E ( [X, X] 


i t<T 


(3.9) 


Moreover, if X t is also continuous, the result holds also for 0 < p < 1. 


3.3 Stopping times 

Definition 3.3.1 A random variable T : ST2 —>• [0, +oo) is called stopping time 
with respect to the filtration {T t ,t > 0), if the event (T < t) is T, -measurable, i.e. 

{co e Li : T(co) < t) = (T < t) e T t , Vt>0. 

The definition of stopping time essentially says that we can decide whether the 
event (T < t ) occurred or not on the basis of the information up to time /. 
The random variable T — t, with / a constant, is a trivial stopping time because 
(T = t < t) — Li e T, . Moreover, if T is a stopping time, then T + s, with ,y > 0 
a constant, is also a stopping time. Indeed, 

(T + s < t) = (T < t — s) e Tt- S C Tt- 

An important fact concerning stopping times is related to the first passage times. 
Let {X n ,n >1} be a sequence of random variables (for example, a random 
walk) and fix a value f. We denote by Tp — inf{« : X n > f>\ the first passage 
time of the process X to the threshold f. By construction, (Tp < k) e where 
J r k — o(X i, Xt. ..., X k ). So, the first passage time is a stopping time. Similarly, 
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with the addition of technical bits, one can prove the same for continuous time 
processes. On the contrary, the last passage time Sp — sup{n : X n > ( J >\ is not a 
stopping time. Indeed, (Sp < k) £ Ti because we need to observe the trajectory 
of X up to infinity. 

Theorem 3.3.2 Let T and S be two stopping times with respect to \T ,, t > 0}. 
Then 

SvT, S AT and S + T 


are also stopping times. 

Proof. We need to prove measurability of each event. Indeed 

s v T = (S < t) O (T < T) e and 5 A T = (S < f) U (T < T) e T t . 

For S+T, we prove that (S+T > if e T, . Let us decompose the event (S + 
T >t) as follows: 

(S+T>t) = tin(S+T>t) 

= ((T = 0) U (0 < T < t) U (T > 0) n (S + T > t) 

= (T = 0, S + T > t) U (0 < T < t, S + T > t) U (T > t, S + T > t) 

and prove that each term satisfies measurability. For the first term we have 
(T = 0, S + T > t) = (S > t) = (S < t) c e T t . 

We decompose further the last term 

(T > t, S + T > t) = ((T > t, S + T > t) 0 (S = 0 U S > 0) 

= (T>t,S+T>t,S = 0)U(T>t,S + T>t,S>0) 

= (T > t) U (T > t, S > 0) 

= (T < tf U «T < t) c n (5>0)) 

and all teims are T t -measurable. We need to rewrite the term (0 < T < t, 
S + T > t) as follows: 

(0 < T < t, S + T > t) = (J (s < T < t, S > t - s) 

s> 0 

= U (0 < t < t) n (S < t - s) c ) 

s > 0 

and notice that 

(s < T < t) = (T < t) n (T < s) c . 

In a very similar way one can prove the following. 
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Exercise 3.4 Let {T n ,n > 1} be a sequence of stopping times with respect to the 
filtration {jT r , t > 0}. Prove that 

inf T n and sup T n 

n 20 n> 1 


are also stopping times. 

Definition 3.3.3 (Stopped o -algebra) Let T be a stopping time with respect to 
the filtration {J 7 ,, t > 0}, we define Tj, the a-algebra stopped at T the following 
set 

J~t = € d~oo '■ A n (T < t) (z J- t , t > 0} 

where = a (|J, d 7 ,). 

3.4 Markov property 

A discrete time stochastic process {X„,n > 1} is said to be Markovian if the 
conditional distribution of X k given the past (X^-i, X k _%, ■ ■ •) equals the condi¬ 
tional distribution of X k given X^-i solely, i.e. 


C(X k \X k - U X k - 2 , ■■■) = C(X k \X k -0. 


This means that the ‘future’ of the process X k depends only on the ‘present’ 
Xk~i and the knowledge of the ‘past’ X k - 2 , ■ ■ ■, X\ does not contribute any 
additional information. Examples of Markov sequences are the i.i.d. sample or 
the autoregressive models like the following 

X n =QX n -i+e„, X 0 = x 0 , €i i.i.d ~ N( 0, ct 2 ), 

with 9 some parameter. The class of Markov processes is quite wide and the 
characterization of the Markov property relies on the state space (discrete versus 
continuous) and time specification (again, discrete versus continuous). We limit 
the treatise of Markov processes to what is interesting to the subsequent part of 
this book. 

3.4.1 Discrete time Markov chains 

An important class of Markov processes are the so-called Markov chains. 
Consider a discrete time stochastic process {X n ,n > 1} with discrete (possibly 
countable) state space S. For simplicity we denote the states of the process 
with j, but in practical cases one can imagine to have S — {0, 1,2,...} or 
S — {0, ±1, ±2, ...), or, if the state space is finite, S = {1, 2,..., d}. We denote 
by p"- — P(X„+i — j | X n — i), i.e. [ p"-, j e 5} is the conditional distribution 
of the process X at time n + 1 given that its position at time n is the state i. 
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Definition 3.4.1 The process {X„,n >1}, with state space S is called Markov 
chain if 


^(An-i-i — j\X„ — i■ X n —\ — i n — i, ■ - - 5 Ao — i o) 

- P(X n+l = j\X n = i) = p'y , V i, j e 5, (3.10) 

with i n -\,..., i\, io e S, when all probabilities are well defined. 

The quantities /?" are called one step transition probabilities. These transition 
probabilities also depend on the instant n. Most interesting to us are homogenous 
Markov chains. 

Definition 3.4.2 A Markov chain such that 


P(X n+ 1 = j\X n = i) = pij, 
for all n > 0 is called a homogeneous Markov chain. 

For homogenous Markov chains, the transition probabilities are independent of 
the time n and in principle, one can describe the whole structure of the process 
taking into account only the initial state Ao and the first transition to X \ , i.e. 
Pij — P(X\ — j | X() — i). The one step transition probabilities constitute the tran¬ 
sition matrix P which, for a state space like S = {0, 1,...}, we can express in 
the form: 


P 00 

P01 

P02 

■ ■ ■ PQj 

Pw 

Pn 

P 12 

■ ■ ■ PC 

P20 

P2\ 

P22 

■ ■ ■ P2j 

PiO 

Pn 

Pi2 

■ ■ ■ PU 


Every transition matrix satisfies the following properties 

Pij > 0, /. j e S , ^2 p^ — 1 for each i e S. 

j^s 

Example 3.4.3 (Random walk) Consider a particle which moves on the real line 
starting from the origin, i.e. So — Xq — 0. At each instant n the particle jumps to 
the right or to the left according to the following distribution: P(X n — +1) — p, 
P(X„ — — q, with p + q — 1. The position of the particle at time n is S„ = 

A/. Assume that the A,- are all independent. The state space of the process 
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is S — {0, ±1, ±2, ...} and the transition from one state to another is given by 
S„ = S n -1 + X n . Clearly {S n , > 1} is a homogenous Markov chain. Indeed, 

P{S n +\ = 7 + 1|S„ = j, S n -i,., Sq) 

= P(S„ + X n+1 =j+l\S n = j, S n - 1 ,.... So) 

= P(X n+l = +l|S„ =j,S,So) 

= P 

because the X,’s are independent and similarly for P(S„+i — j — 1| 
S n — j , S n - 1 ,...,, So). The transition matrix for S n has an infinite number of 
rows and columns which can be represented as follows: 


P = 


q o P 
q o p 
q o p 


where the matrix is filled with zeros outside the two diagonals of p’s and q’s. 

Example 3.4.4 (Random walk with reflecting barriers) Assume that the parti¬ 
cle of Example 3.4.3 can move only on the finite state space S = [a, ..., 0, ..., b] 
such that if the particle is in state a at time n, it is pushed at a + 1 at instant n + I 
with probability 1; if it is in state b at time n it is pushed back at state b — 1 at 
time n + 1; otherwise it behaves as in Example 3.4.3. The transition matrix is 


then represen ted as follows: 



- 0 1 
q o p 

— 


q 0 

P 

P = 


q o p 
q o p 

1 o _ 


and a and b are called reflecting barriers. The first row of P represents the distri¬ 
bution P(X n = k , \X n -i — a), k = a, a + 1,..., 0,..., b — 1, b, and similarly 
for the subsequent rows. 

Example 3.4.5 (Random walk with absorbing barriers) Assume the same 
setup of Example 3.4.4, with the difference that when the particle reaches the 
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barrier a or b, the random walk is stopped in those states, i.e. if the particle is in 
state a (or b) at time n, it remains in state a (respectively b) with probability 1 
for all times k > n; otherwise it behaves as in Example 3.4.3. The transition 
matrix is then represented as follows: 


P = 


1 0 

q 0 p 
q 0 p 


q o p 
q o p 
o 1 


and a and b are called absorbing barriers. 


The transition matrix of a homogenous Markov chain represents the short 
term evolution of the process. In many cases, the long term evolution of the 
process is more interesting, in particular in asymptotic statistics. Let us introduce 
the n-step transition distribution denoted as {p ] , j e 5}, which represents the 
distribution of the process X m+n given that the present state is X m = i. We 
can write: 

pf = P(X m+n = j\X m = i). 


(yi\ 

Notice that P(- does not depend on the time m because the chain is homogenous, 
so we can also write 


pf = P(X n = j \X 0 = i ). 


and clearly p-J ’ = p,/. 

Theorem 3.4.6 The following relation holds 



E (n- 1) 

Pih Phi- 

IieS 


(3.11) 


Proof. Indeed, 

(n) _ P(X n = j , Xq — i) 


E 


P(X n = j,X n -i =h,X 0 = i) 


lJ P(Xo = i) f-L P(X 0 = i) 

h€S 

= P(Z ” = J l Z »-i = 'P X o = i)P(X n -i =h\X Q = i ) 


hzS 


- ^ P(X„ - j\X n -! = h)p% 1} = = J\ x o = h)p\ 


(n- 1) 
h ’ 


hzS 


hzS 


from which we obtain (3.11). 
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Let us denote by P ’ = [ P." 1 , /', j, e <S] the //-steps transition matrix, then the 
following relation holds 

p(n) = pin- 1 ) . p (3.12) 

in the sense of matrix multiplication (matrixes with possibly infinite number of 
rows and columns). From (3.12) we can see that 

pi 2) _ p . p _ p2 


and, recursively, we obtain 


pin) — pn 


Equation (3.11) is a particular case of the Chapman-Kolmogorov equation pre¬ 
sented in the next theorem. 

Theorem 3.4.7 (Chapman-Kolmogorov equation) Given a homogenous Mar¬ 
kov chain [X n ,n >1} with transition probabilities ptj, the following relations 
hold 

(m+n) _ „(">) „(«) 

Pij — P ih Phj ’ 

h€S 

for each m, n > 0 and for all i, j e S or in matrix form: 

pOn+n ) _ pim) _ p(n ) 

The proofs follow the same steps as the proof of Equation (3.11). Along with the 
transition distributions the finite dimensional distribution of the process at time 
n is of interest. Let us denote by n% the probability of Ending X n in state k at 
time n 

n" = P(X n = k), Vk e S. 

Similarly, we denote the law of Xq by n 11 . 

Theorem 3.4.8 Given a homogenous Markov chain with transition matrix P and 
initial distribution n°, the distribution of X n is given by 



heS 


for each k e S, or in matrix form: 

,0 nn 
71 = 71 r . 

Proof Indeed, we can write 

P(X n =k) = J2 p ( x n =k,x o = h) = P(Xn - k\X 0 = h)P(X o = h). 
li&S h&S 

In a similar way, we can derive the joint distribution of any //-tuple (A^ ,... X^). 
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Theorem 3.4.9 Given a homogenous Markov chain, with transition matrix P and 
initial distribution 7T°, the joint distribution of(X kl , X kl ,... X kn ), with 0 < k\ < 
ks < ■■■ < k n , is given by 


P(X kl =h u ...X kn =h„) 


= o ( *d ( *2-*d 

/ v Phh\ Ph\h2 
heS 


kn— l) 
P kn—\h n 


for each 1i\,li2 ,... h n e S. 
Proof. Set 

,,*i. 


z = P{X h =h x . x kn = K) 


and write 


<\:::Z = nx tl = h i,...x kH _ 1 = h n - l ) 

x P{Xk n — h n \X j ci = h i, ..., Xjc n _i = hn— l) 

= p(x kl =h u ... x kn _ t =h n - i)P ?::X- l) ■ 

Then the proof follows iterating the same argument. 

Theorem 3.4.10 (Strong Markov property) Let r be a stopping time. Given 
the knowledge of r — n and X T — i, no other information on Xq, X\,..., X T is 
needed to determine the conditional distribution of the event X T+ \ — j, i.e. 

P(X T +1 = j\X r — i, z — n ) = pij 

Proof. Denote by V n the set of vectors x — do, i\..... i„) such that the event 
{An — i Q , X i = ( |, X n = i „} implies the event r — n and X T = i (note that we 
should ask i„ — i). Then, by the law of total probabilities we have 

P(X T+ 1 = j, X T = i,r = n ) = ^2 P( - x >i+i = h X n — x n ,, Xq — x 0 ) 

X€V„ 

(3.13) 


and the second term can be rewritten as follows: 

'y ' P(X n +1 = j \X n — X n , ..., -Xo — X 0 )P(X n — X n , . . . , Xq = Xq). 

X€V„ 

Remember that if x e V„ then r — n and X T — i, hence x„ = i and by the 
Markov property we have that 

P( X n +\ — j\X n — X n , . . . , Xq = Xq) — Pij 
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and then 


P(X T+ 1 = j\X T = i, x = n) = ptj ^ P(X„ = ..., X 0 = x 0 ) 

XGVn 

= Pij P(X T = i, t — n). 

The statement follows dividing the expression (3.13) by P(X T = i, r — n ). 

The strong Markov property is sometimes written as 

£(X n+l \X n ,X n _ u ...) = C(X n+1 \X z ) 

and it essentially states that, conditioning on a proper random time, the chain 
loses its memory. For example, let r, = min {n > 1 : X n = i] be the first passage 
time of the Markov chain from state i. The random variable r, is a stopping time. 
Then, the new chain {X,t} = {X T;+ /.}, i.e. it behaves as the original Markov chain 
but started from i. 

Let Tj — min(/! > 1 : X n — i } be the first passage time and define 
fa = P(J, < +oo|X 0 = i ) 

This quantity represents the probability that X n returns into state i at least once, 
given that it started from time i. Notice that the initial state Xo does not enter 
the definition of the stopping time r, . 

The strong Markov property implies that the probability that chain returns 
in state i given that it has visited that state n — 1 times, is again fa. Therefore, 
the probability of visiting two times the state i is ft, the probability of visiting 
three times the same state is -fa — .//■ and, in general, the probability of n 
visits is f'\. 

Definition 3.4.11 Let S be the state space of a homogeneous Markov chain. Given 
i e S, the state i is called transient if fa < 1; the state i is called recurrent if 
fa - 1- 

Notice that if fa < 1 the probability that a chain comes back n times into state 
i is /•" and this converges to 0 as n —> oo, then, from a certain instant the 
chain will never visit the state i again. This is why the state is called transient. 
Conversely, is fj — 1, the chain will visit that state infinitely often. Further, a 
state i such that pa — I is called absorbing state. We have already encountered 
absorbing states in Examples 3.4.5. 

Definition 3.4.12 Let P be the transition matrix of a homogenous Markov chain. 
A probability distribution n on S is called invariant distribution if it satisfies the 
following equation 


it — n P. 
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The existence of an invariant distribution means that at each instant the marginal 
law of the chain remains the same, i.e. the chain is stable. Conditions for the 
existence and unicity of the invariant distribution tt can be given, see e.g. 
the books of Feller (1968), Feller (1971) and Cox and Miller (1965). We only 
provide the following result. 

Theorem 3.4.13 Assume that there exists an invariant distribution n — (tt i, 
7t2, ■ ■ ■) and that the initial distribution of the Markov chain is Tt, i.e. 
n j — P(X o = j), j e S. The, the marginal distribution of X n is tt for all n. 

Proof. Denote by v n , the distribution at time n 



i€S 


where we have set the initial distribution equal to tt . We prove by induction, that 
if tt is an invariant distribution then 

Y^TtiPi? =7tj, Vn, j. 

ieS 

This will imply that 

v " = 7t i’ Vn, j. 

Indeed, by definition of invariant distribution it is true that, for n = 1, 
Z leS XiPij ~ 71 j- Assume that it holds for n — 1 , that is ’}Z l( zs 7l i p\T 11 = ttj, 
then 

J2 nip i? = J2 ni 12 p ih~ l)ph j =J2Jl 7tiP i n h~ 1)ph j - H n '' p 'h = 71 J- 

i€S i€S ll€S ll€S i€S ll€S 

Therefore, it holds for all n. 

3.4.2 Continuous time Markov processes 

For a complete treatise on continuous time Markov processes we suggest the 
reading of Revuz and Yor (2004). In this section we just collect the building 
blocks for subsequent considerations. 

For a continuous time stochastic process, we define the transition probability 
the function P s t , s < t such that 


P{X t e A\a(X u , u < s)} = P SJ (X S , A) a.s. 


where A is some set in state space of X. In the case of continuous time processes, 
the Chapman-Kolmogorov equation is presented in the following form: for any 
s < t < v, we have 
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If the transition probability is homogeneous, i.e. P Sit depends on 5 and t only 
through the difference t — s, one usually write P, for fb , and then the Chapman- 
Kolmogorov equation becomes 



Vs, t > 0 


from which we see that the family {P t , t > 0} forms a semigroup. 

Definition 3.4.14 Let X be a stochastic process adapted to a filtration 
F — {F t , t > 0}, with transition probability P s t , s < t. The process is said to 
be a Markov process with respect to the filtration F if, for any non-negative 
function f and any pair (s, t), s < t, we have 


mf(.x t )\Fs} = p s Af&*)) a - s - 


The process is homogeneous if the corresponding transition probability is homo¬ 
geneous and in this case we write 


nf(X t W s } = Pt-sV(X s )) a.s. 


Definition 3.4.15 (Strong Markov property) Let X be a stochastic process 
adapted to the filtration IF — { F t , t > 0), T a stopping time with respect to F and 
Ft the stopped a-algebra. The process X possesses the strong Markov property if 


P(X t e A\F t ) = P(X, e A\X T ). 


The strong Markov process mean that if we ‘start afresh’, in the terminology of 
Ito and McKean (1996), the process X at the stopping time T, the new process 
possesses the same properties as X. So, this is the strong lack of memory of a 
Markov process. 

3.4.3 Continuous time Markov chains 

Although we will see that most of the processes considered later are Markov 
processes, we introduce here a few details about continuous time Markov chains 
that will be useful later on. 

Definition 3.4.16 A continuous time process {X t ,t > 0} with discrete state space 
S is called continuous time Markov chain if 


P(X t+u = j\X u = i, X s =x s ,0 < s <u) = P{X t+u = j\X u = u), 


where i, j e S and {Xj, 0 < s < u) C S. 
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When the process is stationary, the analysis is much simpler and close to that of 
discrete time Markov chains. From now on we assume stationarity and hence, 
the transition probabilities take the form: 

Pij = P(X,+ U = j\X u = i) = P(X, = i, X Q = k). 

We denote the state vector s(t) with components sj(t) = P(X, = k), k e S. This 
vector obeys 

s(t + u) = s(u)P(t) 

with P(t) = [pij(t), i, j e 5] the transition matrix. From the above we obtain 

s(t + u + v) — s(v)P(t + u) — s{u + v)P(t) — s(v)P(u)P(t) 

— s(v + t)P(u) — s(v)P(t)P(u ) 

from which we obtain 


P(t + u) = P(u)P(t) = P(t)P(u), 'it, u > 0, 


that is, the Chapman-Kolmogorov equation. As usual, the transition matrix 
satisfies 

y, Pij(t ) = 1, for each state i e S. 

We denote with P(0) the initial transition probability matrix 


P( 0) = limP(r) = I. 

rj, 0 

and it is possible to prove that the transition probability matrix P(t) is continuous 
for all t > 0. This allows us to define the next object which plays an important 
role for continuous time Markov chains. 


Definition 3.4.17 (Infinitesimal generator) The matrix Q defined as 


lim 

40 


P(t) - I 

t 


= P'( 0) - Q 


is called infinitesimal generator of the Markov chain. 

For discrete time Markov chain it corresponds to P — 1. The sum of the rows in 
Q — [qjj] is zero and such that 


} ' dij — dii 


c/a = lim 

40 


Pij it) 
t 


> 0 


and qa < 0. 


where 
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The quantities qij are called rates. They are derivatives of probabilities and 
reflect a change in the transition probability from state i towards state /'. The 
name ‘rates’ will become more precise when we discuss the Poisson process. Let 
q, — —qu > 0, then \q t j | = 2 q t . For a Markov chain with finite state space, 
the quantities q, are always finite, but in general this fact should not be expected. 
This allows for an additional classification of the states of the Markov chain: a 
state j is called instantaneous if c/, = oc. In this case, when the process enters 
the state j it immediately leaves it. Assume there is a chain where all states are 
not instantaneous. Consider a small time At. Then, 

P(X(t + At) — j\X(t) — i) — qijAt + o(At), i ± j, 

P(X(t + At) = i | Xft) = /) = l- qi At + o(At). 

The above is a generalization of the properties of the Poisson process (see 
(3.16)) which allows for a direct link to the term ‘rates’ and Poissonian event 
arrival rates. 

Theorem 3.4.18 Let Q be the infinitesimal generator of a continuous time 
Markov chain. Then, the transition probability matrix Pft) is differentiable for 
all t > 0 and satisfies 

P f ft) — P(t)Q the forward equation, 

— QP(t ) the backward equation. 

The solution of P’ft) — P(t)Q with initial condition P( 0) — I is 

P{t) = exp(0t). 

In the previous theorem exp(A) is the matrix exponential of matrix A, which is 
given by the power series formula 

exp(A) = I + ^A 2 + jj-A 3 + ■ ■ ■ 

with A k the matrix power, i.e. fc-fold matrix multiplication. Package msm has a 
function called MatrixExp to calculate this matrix exponential reliably, which is 
not completely trivial as mentioned in Moler and van Loan (2003). Obviously, 
the infinitesimal generator and the backward/forward formulas can be defined for 
general continuous time Markov processes and the most important case is the 
family of diffusion processes. We will discuss these points later in the book. 

3.5 Mixing property 

The notion of mixing corresponds to a relaxation of the notion of independence 
in several ways. We know that two events are independent if P(A n B) — P(A) 


102 OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 

P(B) or, equivalently if P(A D B) — P(A)P(B) = 0. The mixing property of a 
sequence of random events A\, A 2 , ■ ■ ■ is stated as 


\P(AinAj)-P(Ai)P(Aj)\ -> 0 


at some rate. 

Definition 3.5.1 Let Xj, be a stochastic process and let T k = cr(X ,, i < k) 
and — a (A,, i — k,..., 00 ). Let A 1 e T k and A 2 e J** n any two events. 

Then, if 

IP(A, n A 2 ) - P(A 1 )P(A 2 ) I < (p(n)P(Ai), Vn, 
we say that the process Xj is tp-mixing. 

If P(A 1 ) = 0, the mixing condition trivially holds, but when P(A \) > 0, we can 
write the ^-mixing condition as follows: 

\P{A 2 \A l )-P{A 2 )\<cp{n). 

In these terms, the ^-mixing property says that the conditional probability of n 
steps in the future, i.e. A 2 , with respect to the present, i.e. A\, is close to the 
unconditional probability of A 2 up to a function depending on n. When the two 
quantities match for all n we have independence, but usually, it is required that 

lim (p(n ) = 0 

n—>00 

at some rate and the rate specifies the strength of dependence between elements 
of the sequence X t . Sometimes the condition is stated in a more general form 
which includes the supremum 

supdnAalAD - P(A 2 )I : Aj e T k , A 2 e T k + n } < <p(n). 


Example 3.5.2 (Billingsley) Consider a Markov chain with finite state space 
Let 


<p{n) — max 
u,veS 


(«> 

Puv 

Pv 


s. 


and suppose it is finite. Let 


Ai = {(X k -i,..., Xf) e Hi} e T k 


with H\ a subset of i + 1 states, and 

A 2 = {( X k+n ,..., X k+n+j ) e H 2 ] e T k + n 




with Hi a subset of j + 1 states. Then 
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|P(Ai n a 2 ) — P{A\)P{A2)\ 

— PuoPitom ' ' ’ Pui-iUi — Pv 0 \Pv 0 v\ ‘ ' ' Pvj-iVj 

where the sums extends to the states (wo, mi , , n,) e Hi (uo,..., u ; ) e H 2 . 
Noticing that p uv — 1 and using the definition of(p(n), we obtain the mixing 
property. 

We will discuss different type of mixing sequences in the second part of the book. 


3.6 Stable convergence 

Definition 3.6.1 Assume there are two probability spaces (Q, J-, P) and 
(Q!, J-', P'). Assume further that there exists a Q{a>,co'), the transition 
probability from (ST2, T) into (£2', IF'). Set 

Q = Qx£2', P(dM.dw') = P(dco)Q (co,co'). 

The space (Ft.iF, P) is called an extension of (FI, IF, P). Denote by E the expected 
value with respect to P. 

This extension of the original space, includes events which are independent 
(orthogonal) to the original space. It is a construction used to make the definition 
of some limits working, like the one presented in the following. Assume to have a 
sequence of real-valued random variables {X n , n > 1} defined on (Fl.lF, P) and 
a random variable X defined on (Q. T, P) which is an extension of (Q. T, P). 
Assume that Q is a sub-cr field of T. 

Definition 3.6.2 (0-stable convergence in law) The sequence X n is said to con¬ 
verge in distribution Q-stably to X if 

nf(X n )Y] E {f(X)Y) 

as n —»• 00 for all bounded continuous functions f on K and all Q-measurable 
random variables Y. We will write 


X n ^ X. 

When Q — IF, we say that X n stably converges in distribution or converges in 
distribution stably and we write 


X. 


X, 
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This notion of convergence was introduced in Renyi (1963) and Aldous and 
Eagieson (1978) and further extended to the class of stochastic processes of 
interest for this book in Jacod (1997, 2002). 

Theorem 3.6.3 If G n is a sequence of Q-measurable random variables such that 
G n 4 - G and if X n ^ X, then (X n , G„) ^ (X, G ). 

The previous theorem says that joint stable convergence can be obtained by mix¬ 
ing Unite dimensional stable convergence with convergence in probability. This 
kind of result will be useful in the proof of asymptotic properties of estimators 
for financial models in the high frequency setup. 


3.7 Brownian motion 

In general, the introduction of a stochastic process is intended to model some 
aspects of stochastic evolution and, of course, financial time series present such 
kind of evolutions. Brownian motion or Wiener process is the basic stochastic 
process in continuous time, like the Gaussian is for continuous random variables. 
Brownian motion can be introduced using different characterizations and here we 
present one. 

Definition 3.7.1 Brownian motion is a stochastic process {B t ,t > 0} starting 
from zero almost surely, i.e. P(Bq = 0) = 1, with the following properties: 

(i) is a process with independent increments: B, — B s is independent of 
B v — B u when (s, t ) IT ( u , u) = 0; 

(ii) is a process with stationary increments, i.e. the distribution of B t — B s , 
t > s > 0, depends only on the difference t — s but not on t and s sepa¬ 
rately; 

(Hi) is a process with Gaussian increment, i.e. 

B, - B s ~ W(0, t-s ). 


Sometimes B, is denoted by W, because Brownian motion is also called Wiener 
process. The Brownian motion is a process which runs its trajectories at infinite 
speed or, more formally, it has infinite first order variation V t {B) — +oo. It 
is possible to prove that the trajectories of B, are nowhere differentiable with 
probability one but, nevertheless, these trajectories are also continuous. At the 
same time the quadratic variation of the Brownian motion is t, i.e. [B, B], = t 
as next theorem shows. 

Theorem 3.7.2 The quadratic variation of B is [B, B], — t in quadratic mean 
and almost surely. 
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Proof. It is sufficient to prove that 

2 " 

lim A l k — t 

n—>oo n ’ K 

k= 1 

almost surely and in quadratic mean, where 

(kt\ ((k — 1 )t 

We start proving the limit in the quadratic mean. Note that the random variables 
A 2 k are independent and such that E(A 2 k ) = 12~ n . We have to prove that 




Then 


E E - t2~ n ) = E E {■- t2~ n } 2 

l k= 1 J k= 1 

- {A 4 Htk + r 2~ ln - f2-” +I E(A^)} 

k= 1 

= 2t 2 2 -2 " = 2t 2 2~ n ”^>°° 0. 

*=t 

To prove almost sure convergence, we first use Chebyshev’s inequality (2.23) 

2 




>t=i 


from which we see that 


>e }^ 72 E {E (A «T- f2 “”)} 


2f 2 2 


2 r y—n 


E p 

n =1 


E( A «,*- r2_n ) 


i=l 


> e J < oo. 


Finally, the Borel-Cantelli Lemma (2.4.14) along with the definition of almost 
sure convergence provides the result. 








106 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


Then again, by previous result, the continuity of the trajectories of B t and 
formula (3.1) one derives that the simple variation of the Brownian motion is 
necessarily infinite. 

Theorem 3.7.3 Let ( B ,, t > 0} be a Brownian motion. The next processes are all 
Brownian motions: 


(i) reflection: — B,; 

(ii) shift: B t+a — B a for each fixed a > 0; 

(iii) self-similarity: B t = cB for each fixed c e I, c ^ 0; 

(iv) time inversion: 


jO, if t = 0, 

|**(}). if 


Proof We only prove the self-similarity property (iii). Clearly B{ 0) = 0. 
Since B, ~ At(0,0 also B t ~N(0,t). Consider the increments B{t/c 2 ) — 
B(s/c 2 ) ~ ( B, — B s )/c and B(u/c 2 ) — B(v/c 2 ) ~ ( B u — B v )/c. They are 
clearly independent if ( 5 , t) fl (n, v) = 0. 


3.7.1 Brownian motion and random walks 

It is possible to see the Brownian motion as the limit of a particular random 
walk. Let Xi, Xi, ..., X n be a sequence of i.i.d. random variables taking only 
values —1 and +1 with equal probability. Let 


S n — Xj + Xi + ■ ■ ■ + x n 

be the partial sum up to time n, with So — 0. S,- representing the position of a 
particle (starting at 0 at time 0) after i (leftward or rightward) jumps of length 1. 
It is possible to show that, for t e [0, 1], 


Vj 



d 




where [x ] denotes the integer part of the number x. The next code is an empirical 
proof of the above convergence. We set t — 0.3 and try to approximate the 
distribution of £(0.3). The graphical output is given in Figure 3.3. 

R> set.seed(123) 

R> n <- 500 
R> t <- 0.3 
R> slm <- 10000 
R> B <- numeric(sim) 

R> forfi in l:sim){ 

+ X <- sample (c(-1,1), n, replace=TRUE) 

+ S <- cumsum(X) 
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+ B[i] <- S[n*t]/sqrt(n) 

+ } 

R> plot (densi ty(B) ,main= " ",ylab- "distribution ", 

+ xlab=expression(B(0.3)),lty=3, axes=FALSE) 

R> axis(l) 

R> g <- function(x) dnorm(x, sd=sqrt(t)) 

R> curvet g, -3, 3, add=TRUE) 

3.7.2 Brownian motion is a martingale 

It is important to notice that the Brownian motion is also a martingale with respect 
to its natural filtration. To see this, it is sufficient to show that E[B, |.F S ) = B s . 
Indeed, writing = B s is equivalent to write E{ B, — B s \T s } = 0 because 

B s is -measurable. Now, the increment B, — B s is independent of T s by 
definition of Brownian motion, and hence E {B, — B s \tF s } — E(B, — B s ) and the 
latter is zero because B t — B s ~ N(0, t — j). 

Exercise 3.5 Prove that { X, = B? — t,t > 0} is a martingale with respect to the 
natural filtration of B,. 

Exercise 3.6 Let p. and a be real numbers. Prove that {X, — e lit+nB ‘ , t > 0} is 
a martingale, with respect to the natural filtration of B t , if and only if p — —\o 2 . 


3.7.3 Brownian motion and partial differential equations 

Denote by p(t — s,x,y) the transition density of the Brownian motion, i.e. the 
density of the distribution P(B, e [x, x + dx)| B s = y). This density is clearly a 
Gaussian density 


p(t - s,x, y) 


1 b-y) 2 

— - p 2(t-i) 

s/lnf — s) 



- 2-10 1 2 
B(0.3) 


Figure 3.3 Approximation of Brownian motion distribution (dotted line) as the 
limit of a symmetric random walk. 
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For simplicity, assume that ^ = 0, then it is easy to show that p{t,x,y) solves 
the so-called diffusion or heat partial differential equation: 


Indeed, 


dt 


1 3 2 p 

Ydx 2 ' 


(3.15) 


3 

dt 


p(t,x, y ) = 


(x - y) 2 


2t 2 


-p(t,x, y) 


3 y — x 

— p(t, x, y) = ——p(t : x , y) 
dx t 


d 2 


(x - y) 2 - t 
2t 2 


P(t,x , y) 


which is enough to get the conclusion. 


3.8 Counting and marked processes 

A point process or a counting process {/V,, t > 0), is a continuous time process 
which records how many times an event has occurred up to some time t. This 
process takes values on the integers and can be represented as follows: 

N t = ^ l{r, <f}» 

i 

where r,- are the random inter-arrival times. A counting process is a cadlag 
(continue a droite, limite a gauche , i.e. right continuous with left limit) process, 
whose trajectory remains constant up to the time when an event occurs. At the 
time instant of an arrival, the process N, jumps upward and continuous with a 
flat trajectory up to next event. The notion of cadlag is then equivalent to saying 
that the points where the process jumps (the arrival times) are included in the 
upper part of the trajectory (see Figure 3.4). 


o - 1 


0 


-6 




■<s 


-€> 


x 2 x 3 

t 


Figure 3.4 Example of a cdd-ldg process. Empty circles indicate that the points 
do not belong to that part of the trajectory of the process while filled circles denote 
the contrary. 
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Figure 3.5 Example of marked point process with Gaussian marks. 

The random times r,-, i — 1,2,... are positive random variables which in 
most applications are assumed to be independent and with a common law as we 
will see. When the size of the jump is not of unit length, the process is called 
marked point process and the jumps at time r,-, say Y Tj , are called marks. Clearly, 
the process is no longer interpretable as a counting process. The marks F r; can 
be as well random variables and the resulting marked point process {X t , t > 0} 
is usually written in this form: 


N, 



where N, is the counting process for the random times r,-, i = 1,2,_Figure 3.5 

represents a trajectory of a marked point process with independent Gaussian 
marks Y Tj ~ N(0, a = 2). 

3.9 Poisson process 

The Poisson process is one of the principal stochastic processes in continuous 
time with discrete state space. It is a counting process with independent arrival 
times which are exponentially distributed. This process is used to model the 
number of rare events in time. In financial markets ruptures or big shocks are 
usually considered rare events and Poisson process is used to model them. This 
process can be modeled in several ways as for Brownian motion. We denote 
by N t — N(t) — N ([(), / )) the value of the process up to time t. We denote by 
N([a, h )| = Nb — N a the number of events between a and b. Let [t, t + At) be 
a small time-interval. The Poisson process is such that in this small interval at 
most 1 event can occur with non-negligible probability. More precisely 


P(N([t, t + AO) = 0) = 1 — kAt + o(At) 
P(N([t, t + At)) = 1) = kAt + o(At) 
P(N([t, t + At)) > 2) — o(At) 


(3.16) 


where o(At) is negligible with respect to At, as At -> 0. In particular, the second 
assumption P(N([t, t + At)) — 1) = XAt + o(At), is motivated by saying that 
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‘in a small time-interval, the number of arrivals is proportional to the length of 
that interval’. Notice that the set of equations (3.16) is just a particular case of 
(3.14) for continuous time Markov chains. It is also assumed that increments 
of Poisson process are independent of the past. Under these assumptions it is 
possible to show (see Cox and Miller 1965) that 

lt (Xt ) k 

P( Nt = k ) = e- Xt ±-L, k = 0 , 1 , 2 ,... 
k\ 

which means that N t is a Poisson random variable with rate Xt, i.e. N, ~ Poi(iU). 
Further, let T be the random time between two Poisson events, then T ~ Exp (a). 
We will use this aspect in the simulation of the Poisson process. 

The above version of the Poisson process is called homogeneous because the 
rate X at which events occur is constant. It is possible to generalize it to the case 
in which X — X(t) is a function of time. The nonhomogeneous Poisson process 
is characterized by its intensity function 

A(f) = [ k(s)ds 

Jo 

and its distribution has the following form: 

P(N t = k) = e - A(f) ^L, k = 0,1,2,... 

i.e. N, ~ Poi(A(r))- From the fact that N, is a Poisson random variable we have 
immediately that E (N t ) — A (t). Clearly, when X(t) = X then A(f) = Xt and the 
process is homogeneous. The statistical analysis and modern treatment of general 
Poisson processes can be found in Kutoyants (1998). 


3.10 Compound Poisson process 

The compound Poisson process is defined as X, — Y Zj , where N, is a 
Poisson process and Y z . are the jumps (the markers) at random times rAs will 
be explained later, the compound Poisson process plays an important role in the 
construction of the Levy process. It is easy to derive the mean of the compound 
Poisson process. 

Theorem 3.10.1 Let N, be a compound Poisson process and let Y Zi be i.i.d. with 
common mean /i. Then 


E(Z,) = nA(t). 
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Proof. 


E(X f ) = E (X>,) = E |e y * N ‘ 
oo r k 

= J2 E jE Y *i N > = k \ p ( N > = k ) 

*=i i i=i 


= X! e () P(Nt = k) = Y. pkP{N ' = k) 


k= 1 \i =1 / 

oo 

= liY^ k P(N, = k) = fjM(Nt) 

k =0 
= fiA(t) 


k= 1 


because for k — 0, kP(N t — k) = 0. 

Although the proof is elementary, it is useful to see it because of the condition¬ 
ing argument which is often used to prove results involving Poisson processes. 
When Y Zi is a sequence of i.i.d. random variables and common characteristic 
function (p Y (u), also the characteristic function of a compound Poisson process 
can be easily determined. Indeed 


<Px t 


(u) — E (e iuX ') — E ||exp j iu ^ Y Tj 

= E ^E | exp | iu Y Tj J 


N t | I = E (<; py(u) N ‘) 


= E 


cp Y (u) k (Xt) k _ xt _ ^ PrWW 1i ±bW1( 


kn f\k 


k=0 


k\ 


E—-— c "«■ 


E 

k=0 

= exp {kt{<p Y (u) - 1)}. 


k =0 
\k 


k\ 


-kt+<p Y (u)>,t (<PY(u)kt) VyM xt 

7 k\ 

k=0 


Note that, conditionally on the value of N t , X t is just a sum of independent 
random variables with common characteristic function <p Y (u), so we applied 
Equation (2.7) in the last equality of the first line of the proof. As usual, even 
if <fx, (u) is a nice formula, an explicit version of the distribution of X, cannot 
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always be found and FFT or other methods have to be considered. But there is 
at least one special case for which it is possible to derive the distribution of X, 
without using the inverse of the characteristic function. Assume that the markers 
are i.i.d. random variables with common law N(fi, a 2 ). Then X t , conditionally on 
the value assumed by N t , is just the sum of N, i.i.d. Gaussian random variables, 
i.e. X, | N, is distributed as N (/i N,, a 2 N,). From this fact, summing up with 
respect to all possible values of N t one gets the series-type expression 


OO 

F Xt (x) = P(X, <x) = '£p 

k= 1 




P(N t =k ) 


oo / k 


k=\ \< = l 



&) k u 

k\ 


with J]f =1 F r; ~ N(kfi, ka 2 ). Therefore, the density of X, is given by the 
following expression 




^-x t 


This is an infinite series where the higher order terms are negligible because 
P(N, — k ) converges quite fast to zero as k increases. Therefore, depending on 
the values of A and t a reasonable number of few terms have to be retained in 
the approximation. The Poisson and the compound Poisson processes are also 
time-invariant, in the following sense: 

Theorem 3.10.2 Let X, be a compound Poisson process with intensity function 
A(t) and Y Ti a sequence of i.i.d. random variables. Then 



X, - X s 

~ X t - S , s < t. 


Proof. 


/ N, 

N s \ / 

( N, \ 

P(X, 

-X,<x) = p[Y, Y n 


£ y ^A 


\i= I 

i=1 / \ 

d=N s +1 / 


but the above sum contains N, — N s terms, and because Y Zi are i.i.d. random 
variables, we have that 
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Now, notice that N, — N s ~ N t ~ s because N t — N s + N t - S due to independence 
of the Poisson increments from the past. Therefore, 



3.11 Compensated Poisson processes 

We have seen that the Poisson process N, has mean A(f), and in general A(i) > 0. 
This implies that N t is not a martingale, but it is possible to compensate the 
Poisson process with another process which makes it a martingale. The process 
A, which makes 


M, = N t - A, 


a martingale is called compensator. 

Theorem 3.11.1 Let X, be a compound Poisson process with intensity function 
A(t) = Jq k(u)Au and let the random variables T r; be i.i.d. with common mean 
p, then 


M, — X, — pA(t) 


is a martingale. 

In short, the compensator A, of a standard or compound Poisson process is its 
mean A, = EX,. Notice that, for the standard Poisson process, Y Zj — 1 for all 
i and hence p = 1. Therefore, the corresponding martingale process is M, — 
N t - A(f). 

Proof. 


E{M t \T s ] = E{X, - nA{t) ± X s |^ s } = -pA(t) + E{X, - X, + X,.[J^,} 
= —pA{t) + E(X, - X,) + X,. = -pA(t) + E(X,_,) + X s 
= —pA(t) + pA[t - s) + X s 


= — p X(u)du + pA(t — s) + X s 



— —pA(s) + X, — M s . 


3.12 Telegraph process 

The telegraph process, studied in (Goldstein 1951) and (Kac 1974), models a 
random motion with finite velocity and it is usually proposed as an alternative to 
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diffusion models. The process describes the position of a particle moving on the 
real line, alternatively with constant velocity +c or — c. The changes of direction 
are governed by a Poisson process with intensity function A (f). The telegraph 
process or telegrapher’s process is defined as 

X t =x 0 +V 0 f (~l) Ns ds, t> 0, (3.17) 

Jo 

where Vo is the initial velocity taking values ±c with equal probability and 
independently of the Poisson process {N, , t > 0). For simplicity, we assume that 
a'o is not random and we choose it as xq = 0. Related to the telegraph process is 
its velocity process defined as 

V f = Vo(—l) Wf , t> 0, 

so that X t — Jq V d.v. The telegraph process is not Markovian, while the coordi¬ 
nate process (X,, V t ) is. Unfortunately, in applications with discrete time data, 
only the position X, is observed but not the velocity. We will return to this 
problem when we will discuss estimation for this model. We now briefly study 
the velocity process V t . The following conditional laws: 

P(V, = +c|V 0 = +c) and P(V t = -c\V 0 = +c), 

characterize the velocity process. Their explicit forms are as follows. 

Theorem 3.12.1 

ft i I g -2A (t) 

P(V(t) = c|V(0) = c) = 1 - / A(^)e- 2A(i) di =-, (3.18) 

Jo 2 

ft 1 __ e -2A(f) 

P(V(t) — —c|V(0) — c) — X(s)e~ 2A(s) ds = --- (3.19) 

Jo 2 

Proof. First notice that, the two probabilities 

p( +c \t) = P(V, = +c) and p ( ~ c) {t) = P(V t = -c) 

are solutions of the following system of partial differential equations 

f p\ +c \t) = A t (t)(p ( - c \t) - p(+ c \t)) 

[ P \- c \t ) = A t mp (+c \o - P ( ~ c \t )) = -p t +c] (t) 

This can be proved by Taylor expansion of the two functions p ( t +L \t) and 
pf (/). We give only the derivation of (3.18) because the other follows by sym¬ 
metry. Conditioning on Vo = +c implies that: p l+r> (0) — 1, pi 1 ' 1 (0) = — A,(0) = 
— k(0), p (_e) (0) = 0 and p\~ L) { 0) = k(0). From (3.20) we obtain that 
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and by simple integration by parts it emerges that: 

Pt C \t) = exp | —2A (t) + log 

from which also follows (3.18). Remark that it is possible to derive (3.18) by 
simply noting that 


(S5j) + los< - i<ol) 


P(Vt 


-- +cl Vn 


and then P(V, = -c|V 0 = +c) = 1 - P(V, = +c\V 0 = +c). 

We conclude the analysis of the velocity process by giving also the covari¬ 
ance function of the couple (V t ,V s ), s, I > 0. It can be easily proven that the 
characteristic function of the couple (V,, Vft) is, for all (a, ft) e R 2 , 

E _ C0 S (q; C ) cos (ftc) — e -2 ( A ( f )- A («)J sirhac) sin(/lc). 


The velocity process has zero mean, therefore the covariance function is 


Cov(R s , V I ) = E(V s V l ) = 


d 2 

dadft 


g / e iaV(s)+iaV(t) 


) 


_ c 2 e -2|A(/)-A(s)| 

a —^—0 


3.12.1 Telegraph process and partial differential equations 

The name of this process comes from the fact that the law of the process X, 
is a solution of the so-called telegraph equation which is a hyperbolic partial 
differential equation of the following type 

u tt (t, x) + 2Xu t (t, x ) = c 2 u xx (t, x) (3.21) 

where u(t, x) is a two-times differentiable function with respect to argument x 
and one-time differentiable with respect to t, with and c two constants. The 
telegraph equation emerges in the physical analysis of quasi-stationary electric 
swings propagation along the telegraph cables and it is sometimes called the wave 
equation for this reason. We now give a derivation of the telegraph equation in 
the nonhomogenous case. We consider the distribution of the position of the 
particle at time t 


P(t,x) = P(X, < x) (3.22) 

and we introduce the two distribution functions F(t, x) — P ( X, < x, V, — +c) 
and B(t, x) — P (X, < x, V, — — c), so that P(t, x) — F(t, x) + B(t, x) and 
W(t, x) = F(t, x) - B(t, x). The function Wft, ■) is usually called the ‘flow 
function’. 
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Theorem 3.12.2 Suppose that F{-, ■) and B(-, ■) are two times differentiable in 
x and t, then 


| F t {t,x) = —cF x (t, x) -X(t)(F(t,x) - B(t,x)) 
| B t (t, x) — cB x (t, x) + X(t)(F(t, x) - B(t, x)) 


moreover P{t, x) in (3.22) is a solution to the following telegraph equation with 
nonconstant coefficients 

d 2 9 2 9 2 

— ~u(t, x) + 2 X(t) — u(t, x) — c — ~u{t, x) . (3.24) 

dt 1 dt 9x 2 

Proof By Taylor expansion, one gets that Ff, ■) and Bf, ■) are solutions to 
(3.23) and rewriting system (3.23) in terms of the functions W (■, ■) and P (■, ■) 
it emerges that 


f P,(t,x) = -cW x (t,x) 

\ W,(t,x ) = cP x (t,x ) - 2 X(t)W(t,x) 


The conclusion arises by direct substitutions. In fact, from the first of system 
(3.25) we have 

P,t(t,x) = — P t (t,x ) = -c— W x (t, x). 
ot dt 

Furthermore, 

W tx (t, x ) = ^-( cP x (t , x) - 2k(t)W(x, t)) 
dx 

= cP xx (t, x) - 2X(t)W x (t, x) 

2 

= cP xx (t, x) + -X(t)P,(t, x) 
c 

by using respectively the second and the first equation of system (3.25). 

Consider again the distribution P(X, < x) in (3.22) which can be rewritten 
as the transition from time 0 to time t of the telegraph process, i.e. P(X, < 
x\xq — 0) (we have initially set xq — 0, but argument works for any initial non- 
random A'o). This distribution is a mix of a continuous and a discrete component. 
The discrete component arises when there are no Poisson events in [0, t). In 
this case, X, = ±ct depending on the initial velocity Vo = ±c, and this happens 
with probability \e~ Xt . We know that (3.22) solves the telegraph equation (3.24), 
so does the density of its absolute component. By different means Goldstein 
(1951), Orsingher (1990) and Pinsky (1991) obtained the transition density, 
which we present as the sum of the continuous and discrete part using the Dirac 
delta function 

pit, x; 0, x 0 ) = —| XIq (^Vc 2 t 2 - (x - x 0 ) 2 ^ 
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+ I/O (~Vc 2 1 2 - (x - Xo ) 2 


{\x-x 0 \<ct} 


„~kt 


+ 


{<$(* — Xq — Ct) + 8{x — Xo + ct )} 


(3.26) 


for any \x — xo\ < ct, where l v (x) is the modified Bessel function of order v 
(see (Abramowitz and Stegun 1964)) and 5 is the Dirac delta function. For the 
nonhomogenous case, the general solution to (3.24) is not know, but in one case. 
The proof can be found in Iacus (2001). 

Theorem 3.12.3 Suppose that the intensity function of the Poisson process N, in 
(3.24) is 


X(t) = X e (t) = 6 tanh (Ot), 6 el. (3.27) 

Then, the absolutely continuous component pg(-) of distribution (3.22), 
conditionally on Vq = +c, is given by 


Pe(t,x | Vq = +c) 


cosh(e 0 2 Vc 2 t 2 -x 2 


0, 


\x\ < ct, 
otherwise. 


(3.28) 


3.12.2 Moments of the telegraph process 

Coming back to the homogenous version of (3.22), we have the following results 
concerning the moments of the telegraph process. Again we assume xo = 0. The 
first two moments of the process are well known and derived in Orsingher (1990) 

2 kt\ 

—j ■ (3.29) 

Next results generalize (3.29) to any integer q (see, Iacus and Yoshida (2008)). 
Theorem 3.12.4 For every positive integer q, 

e {xf} - (ct) 2 * (iy 3 r y+y \i q+ ^t) +i qA( ^} 0.30) 


E(Z,) = 0, 


and 


E(Z ; 2 ) = 


t - 


2X 


The modihed Bessel functions admit the following expansion 


/vW = 


1 

r(v + l) 




4(u+ 1) 


+ 


32(v + l)(v + 2) 
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from which we obtain that E j xf ! J is of order t 2q as t -> 0. The following 
expansion, for t —> 0, will be useful in the following 

E{X 2 } = C 2 r - ^C 2 AT 3 + + o{t A ) (3.31) 

E{X 4 } = c 4 f 4 - ^c 4 Xt 5 + ^c 4 A 2 t 6 + °(t 6 ) (3-32) 

6 3 

E{Z f 6 } = c 6 t 6 - -c 6 xf + -cv 6 X 2 t % + o(t 8 ) (3.33) 

The moment generating function for X, was derived in Di Crescenzo and 
Martinucci (2006). 


Theorem 3.12.5 For all set and t > 0 


E e 


sX, 


= e 


-xt 


cosh 


(ty/ X 2 + s 2 c 2 ^j + 




S 2 C 2 


sinh X 2 + s ,2 c 2 ^ 


(3.34) 


Many authors analyzed probabilistic properties of the process over the years (see 
for example Orsingher (1985, 1990); Pinsky (1991); Fong and Kanno (1994); 
Stadje and Zacks (2004) but here we only mentioned those that will be used in 
the next chapters. 


3.12.3 Telegraph process and Brownian motion 

It is quite easy to see that in the limit as c —> oo, X —> oo and X/c 2 -> 1, 
the telegraph equation (3.21) converges to the heat equation (3.15). Although 
a rigorous proof can be written, the above result intuitively means that, in the 
limit when both the velocity and the intensity of the Poisson process diverge, 
the telegraph process converges in law to a Brownian motion. But this is true at 
path level. Indeed, the limiting process moves at infinite speed and its trajectory 
becomes nowhere differentiable like the Brownian motion case. 


3.13 Stochastic integrals 

In Chapter 1 we introduced stochastic differential equations as a way to model 
returns of an asset price {.S’,, / > 0} in this form: 

= deterministic contribution + stochastic contribution 
S, 


or, more precisely, 


d S t 

— = fid t + crdB t , 
5 ? 
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where d B, is the variation of the Brownian motion. This writing is mathematically 
acceptable if dt > 0 is not infinitesimal, otherwise the following limiting equation, 
as dr -* 0, 


d5j = gcS t dt + crS t dB t 


(3.35) 


is not correct because Brownian motion has infinite variation. We also noticed 
that it is possible to give a precise meaning to (3.35) in integral form: 

S t — S 0 + /r f S u du + cr f S u dB u , 

Jo Jo 

where / Q r S u dB u is the stochastic or Ito integral. We now introduce Ito integral in 
a more precise way, but before giving the formal definition we offer an intuitive 
way to construct such an object. In fact, integrals of random processes, say 

{*r,l>0} 

I{X) = [ X u du 
Jo 

can be interpreted, co by to, as ordinary integrals, i.e. 

I(X, co) = f X(u,co)du 

Jo 

and then, loosely speaking, I(X) can be interpreted as the random ‘area’ under 
the trajectory of the process. On the contrary, the stochastic integral involves 
randomness in the integrator part as well and, even co by co, this writing 


I (X, co) 



X(u, co)dB(u, co) 


needs attention. To understand the following, consider f(u) — X(u,co) and 
g(u) — B(u,co), for a given fixed co as two nonrandom functions and forget for 
a while the link between g(-) and Brownian motion. When g(-) is differentiable, 
it is possible to define integrals of the form f(u)dg(u) as 


/ f(u)dg(u) = [ f(u)g'(u)du. 
Jo Jo 


If g(-) is not differentiable but has at list finite variation (e.g. g(x) — |x|) it is 
still possible to define the integral of /(■) with respect to the variation of g( ) 
as follows: 

n— 1 

lim E f(Si)(g(s i+ 1 ) - g(Si)), 

11 11 0 
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where n„ = n„([0, 1 ]) = {0 = so < si < • • • < s n = f) and the limit is as 
n -> oo. As mentioned, we cannot introduce the stochastic integral as follows: 

r-t n—i 

/ X(u)dB(u) = lim V X(s,)(B(s l+1 ) - B(sj )) (3.36) 

l n n —»oo *—•' 

1/0 i =1 

because V,(B) = +oo. We need to consider the limit in a different metric instead. 
Assume that A(s,) is independent of the Brownian increment B(s l+ \) — B(sj) in 
(3.36). We have that 


E[X( Si )(B(s i+l ) - B( Si ))} 2 = E(X 2 ( Si ))E(B(s i+l ) - B(s t )) 2 

= E(A 2 (i;)) ■ (s i+l - Si) 


and hence 


E 


n-i 

J2x(si)(B(s i+1 ) - B( Si )) 


. i=i 



J2x(si) 2 (B(s i+l ) - B( Si )) 2 

i=i 


+ A 


«—i 

= T>(Xfo) 2 )( Ji+ i - St) 

i=i 

where the term A contains cross products terms of the form: 


X(Si)(B(s i+l ) - B( Si ))X(Sj)(B(s j+l ) - B(sj)) 


with i ^ j. By the properties of the increments of Brownian motion and the 
assumption of independence of X(s{) and B(Sj + \) — B (.?,■), the expected value 
of those terms are all zero. Therefore, we have that 


lim E(y'x(s i )(fife+ 1 )-fi(s«))! 

n „IK0 J 

n-i 

= lim = y'E(X(s,) 2 )(s i+1 - sO = / E(X 2 )du. 
NHniKO Jo 


So we can define the stochastic integral as follows: 


/ (X)= lim Efy'X(s I -)(B(s,-+ 1 )-fi(Si))) 

||n-ii-o Itr J 


(3.37) 


Now we need to notice two facts: 


(i) the limit in (3.37) exists only if J ^ E(Z 2 )dn < oo; 
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(ii) we assumed independence of X (,v,-) from B(s l+ \) — B(si). This assumption, 
more precisely, should be stated as X being adapted to the filtration 
generated by the Brownian motion. 

The above are two elementary properties we need to ask to the integrand X in 
order to have a well-defined stochastic integral. 

Definition 3.13.1 (Stochastic integral) Let {X,, t > 0} be a stochastic process 
adapted to the filtration generated by the Brownian motion and such that 
fl E(X 2 )d u < oo. The stochastic integral of X is defined as 

t 1 "- 1 

X„dfi„= lim E | y~]x(si)(B(sj + i) — B(si)) 




It is not always evident that the stochastic integral exists. 

Example 3.13.2 Consider the stochastic process X defined as X, = s~ 1 B s . 
This process is clearly adapted to the natural filtration of B. Let us check the 
integrability condition 


u 2 E(5 2 )dn 


f E(X 2 )dw = f 
Jo Jo 

— / u~ 2 udu = / n _1 dn = logf — logO = 
Jo Jo 


oo. 


Thus, for this process the stochastic integral is not well defined. If instead we take 
X(s) — B(s), then the integrability condition is verified 


[ E(B,;)d« = / 
Jo Jo 


udu = — < oo. 
2 


Example 3.13.3 Consider the stochastic process X s — B(s + 1). The integra¬ 
bility condition is clearly satisfied, but this process is not adapted to the natural 
filtration ofB, i.e. X s — B(s + 1) is not IF s -measurable, because T s — cr(B u , u < 
s) and events concerning B(s + 1) are not contained in J- s . 

Notice that, in Definition 3.13.1, the summands are of the following form 
X(sj)((B(sj + \) — B(si)) and we ask for adaptedness in order to be able to 
calculate expected value of the square of these terms. It is possible to define the 
stochastic integral in the Stratonovich sense as follows: 
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i.e. considering the value of X at the mid-point of the interval [.s,, ,y, + | |. In this 
case, adaptedness of X is not enough and the stochastic integral has not all the 
properties which are very useful in stochastic calculus. 1 

Definition 3.13.4 A process X such that / (X) exists is called an ltd integrable 
process. 

Notice that the stochastic integral is itself a stochastic process. 


3.13.1 Properties of the stochastic integral 

Let X and Y be Ito integrable and a and h two real constants. Then 
• zero-mean property 


E 


J X u d = 0; 


• Ito isometry 


Var X u dfi„) = E (^J‘ X u dB^j = j' 


E(Z2)d«; 


• linearity 

I(aX + bY) = al(X) + bI(Y)\ 

• integration of a constant 


1(a) — / adB u — al d B„ = aB,\ 
Jo Jo 


(3.38) 


(3.39) 


(3.40) 


(3.41) 


• martingality: let M t — Mo + / 0 X u dB u then M, is a martingale. 

Using the definition of I (X) is not always very easy as the next example shows. 

Example 3.13.5 Let us calculate the stochastic integral of B, i.e. B s dB s . Some 
trivial but lengthy algebraic steps show that the following equivalence holds true 


n— 1 


n — 1 


\ 1) - B(si)) 2 = l -B(t n ) 2 - B(s t )(B(s i+ 1 ) - B(s t )), 

z j= 0 1=0 

so, let us start from the above and evaluate the limit as n -> oo of 

71—1 j j 71—1 

£ B(Si)(B(s i+ 1 ) - B(s t )) = -B(t n ) 2 - - Y^Bisi+f) - B( Si )) 2 . 


i=i 


j =o 


1 Although the Stratonovich integral shares similar properties with ordinary integrals. 



This limit yealds 
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1,1 1,1 
- B; - -[B, B], = -Br - -t, 
2 2 2 2 


therefore, 


L 



(3.42) 


From the previous example we obtained a new property for the Ito integral 
expressed by Equation (3.42). Looking closely to (3.42) we also notice an 
uncommon property of I(X). Indeed, consider the deterministic and differentiable 
function /(■), /(0) = 0, and calculate fl f(s)d f(s). Using integration by parts 
formula we obtain 



and hence 



The last formula above and (3.42) differ in the additional term — f/2 present in 
the stochastic version. So we can conclude two things: (i) if available, the formula 
of integration by parts for the stochastic integral should look different from the 
usual formulas; (ii) calculate the stochastic integral applying the definition is not 
an easy task. It6’s formula will help in both cases but first we give a formal 
definition of an Ito process. 

Definition 3.13.6 (Ito process) Let {X ,, t > 0} he a stochastic process that can 
be written as follows: 



(3.43) 


with g(s, to) and h{s, to) two adapted and progressively measurable random func¬ 
tions such that 




and 


Then X is called an ltd process. 

Essentially an Ito process is a process which can be written in the form of 
a stochastic integral, which is indeed a stochastic process, plus some random 
variable. Conditions on g(-) an h(-) only require the existence of the integrals 
apart from null measure sets. Clearly Brownian motion is an Ito process because 
it can be rewritten as in (3.43) with g(x, to) — 0 and h(s , co) = 1. A particular 
class of Ito processes, which will be defined later, is the class of diffusion process. 
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3.13.2 Ito formula 

Calculating a stochastic integral means essentially to rewrite it in a form in which 
the integral with respect to Brownian motion disappears. One way to obtain this 
result very quickly is the famous Ito formula which we are going to introduce. 
Ito formula (sometimes called Ito Lemma) is a sort of Taylor formula for Ito 
integrable processes. We are not going to prove the Ito formula, but try to give 
an intuitive derivation. As usual, we start with two nonrandom functions /(■) 
and g( ) assuming all regularity conditions needed without mentioning them 
explicitly. Let us write 

= f(g(t))g'(t), 
at 

then t 

f(g(t)) = f(g( 0)) + I f'(g(s))g'(s)ds 
Jo 

and hence ; 

f(g(*)) = f( 8 m + [ f\g(s))d 8 (s). 

Jo 

Now, if we replace g(-) with the Brownian motion, we obtain the following 
formal writing: 

f(B,) = f( 0)+ / f(B s )AB s , 

Jo 

which is actually false (remind that we use differentiability of g(-) in the above). 
The right formula is in fact the Ito formula. 

Lemma 3.13.7 (Ito formula) Let /(■) be two times differentiable with respect 
to its argument and measurable. Then 

f(B, ) - /(0) + f f(B s )dB s + l- f f'(B s )ds. (3.44) 

Jo 2 J o 

We now show how to obtain (3.42) via Ito formula. 

Example 3.13.8 Consider fix) — x 2 and let us calculate f{B t ) using (3.44). 
Clearly f'(x) = 2x and fix) — 2 and in our case x is the Brownian motion. 
We have f f 

fiB t ) = /(B 0 ) + [ f\B s )dB s + \ [ fiB s )ds 
Jo 2 J o 

B; = ° 2 + J 2B s dB s + j 2ds, 


and 
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from which we easy obtain the (3.42) 


/' 


BAB, 


-Bf - -t. 


If we imagine to expand in Taylor series f(B(t )) up to the second order, we 
obtain the following formal expression 

d f(B t ) = f'{B t )dB, + ^f"(B t )(dB t ) 2 + rest 

for B, we know from Section 3.7 that the quadratic variation (d B,) 2 — di. Ito’s 
result shows that what is of order greater than df goes to zero, hence the rest 
is indeed negligible and we obtain again (3.44). The general rule of taking into 
account the order of the differentials is that all cross products of the form ‘d B, ■ 
df’ or their powers are negligible, the same is for the powers of (d t) k for k > 2 
and, of course, (d B t ) 2 = df. So, after the application of the Ito formula, all the 
differential terms different from df = (d B t ) 2 and d B, can be neglected. 

A general trick to successfully calculate a stochastic integral via It6 formula 
is to identify the function /(■) or, better, its derivative. Suppose we want to 
calculate r 

g(B s )dB s . 

It6 formula is instead given as 

f(B t ) = /(0) + f f(B s )dB s + I f f"(B s )ds 
Jo 2 J o 

which can be rewritten as 

f f'(B s )dB s = f(B t ) - /(0) - \ f f"(B s )ds 
Jo 2 J o 

so, we need to consider the identity f'(B s ) = g(B s ) and try to recognize /(■) 
from its derivative /'(•). In the above we had g(x) — x and hence / ( | B s d B s and 
we need to set the identity 



(B s )dB s = 



B s dB s 


i.e. f'(x) — x from which it is clear that fix) — ^x 2 . Next exercises are 
examples of use of the Ito formula. 


Exercise 3.7 Calculate 
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Sometimes, stochastic integrals involve more complex functions of the Brownian 
motion, for example functions of both t and x. In this case, we have this version 
of the Ito lemma. 

Lemma 3.13.9 (Ito formula) Let f{t,x ) be a measurable function, two times 
differentiable with respect to x and one time differentiable in t. Then 



fxx Cl B s )ds 
(3.45) 


d 2 f(t,x) 

dx 2 



Exercise 3.8 Prove that 



It is also possible to prove the following formula of integration by parts. Let / = 
f(s) a nonstochastic function, continuous and with finite variation on [0, t\. Then 



Jo Jo 

We now present a version of the Ito formula which holds for Ito processes. 

Lemma 3.13.10 (Ito formula) Let f(t,x ) be a measurable function, two times 
differentiable with respect to x and one time differentiable in t, and let X be an 
ltd process. Then 



fx(s, X s )dX s 


Jo Jo 



(3.46) 


Clearly (3.46) reduces to (3.45) if X t is B t and (3.45) reduces to (3.44) is 
f(t, x) — fix). Sometimes, it is useful to present Ito formula using its differential 
form, which looks as follows: 



This version is useful to express the dynamics of stochastic process which are 
transformation of Ito processes as we will see in details later. 

Exercise 3.9 Calculate Jq( 2B" — s 2 )dB s , n > 1. 
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3.14 More properties and inequalities 
for the Ito integral 

Due to the fact that the stochastic integral is a martingale, we can derive 
some inequalities. Throughout this section we assume that X, and Y, are two 
Ito-integrable process. 

Theorem 3.14.1 



Theorem 3.14.3 Assume that for some p > 1 we have 

-T 


E 


{ f \X t \ 2p d t 


< oo. 


Then, 


r rT 

2p [ I 

E | J X,dB, 

<(p(2p-l)) p T p -'EU \X,\ 2p dt 


Theorem 3.14.4 (Burkholder-Davis-Gundy) There exists a constant C p > 0 
depending only on p such that 



r> 

2 P 

\ f T 

E ■ 

sup / X s dB s 


< C„E \ / Xrdt 


0<t<T JO 


llo 


In particular, 



ft 

2 ' 

\ f T 

E ■ 

sup / X s dB s 


< 4E \ Xrdt 


0<t<T Jo 




Theorem 3.14.5 For any ltd integrable process X we have that 
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3.15 Stochastic differential equations 

In the previous section we have seen that stochastic differential equations of 
the form: 

d S t — l^Sfdt + aS t dB t 

have a mathematical meaning in the sense of Ito integrals 

S t — Sq + fx f S u du + cr[ SudB,,. 

Jo Jo 

We have also introduced Ito processes of the form: 

g u du + / h„dB u 

Jo 

where g t and h, are stochastic processes like in the geometric Brownian motion 
above, i.e. g t — /iS, and h, — oS t . We now introduce the class of diffusion 
processes which are solutions to stochastic differential equations of the following 
form: 



dX t =b{t,X t )dt + o(t,X,)dB t , (3.48) 

with some initial condition X(>. The initial condition can be random or not. If 
random, say Xq — Z, it should be independent of the a -algebra generated by 
B and satisfy the condition E|Z| < oo. The two deterministic functions b(-, ■) 
and cr 2 (-, ■) are called respectively the drift and the diffusion coefficients of the 
stochastic differential equation (3.48). In order to have a well defined stochastic 
differential equation the drift and diffusion coefficients should be measurable 
functions and satisfy 

P I f sup (\b(t, x)\ + cr 2 (f, x))dt < oo i = 1, for all T, R e [0, oo). 

[Jo | jc |<« 

(3.49) 

The above condition is enough to satisfy Dehnition 3.13.6 of Ito processes. 
Geometric Brownian motion is clearly an example of diffusion process. 

3.15.1 Existence and uniqueness of solutions 

In Chapter 1 we introduced stochastic differential equations as a way to model 
asset dynamics, but writing down a stochastic differential equation does not 
necessarily mean that a stochastic process solution to it exists. 

Moreover, a solution to a stochastic differential equation can be of two 
different types: weak and strong. Two weak solutions of a stochastic differ¬ 
ential equation are stochastic processes which are equal only in distribution 
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but are not pathwise identical. Strong solutions are on the contrary pathwise 
determined. A strong solution is also a weak solution, but the contrary is not 
necessarily true. Statistics usually require weak solutions because it is based on 
distributional properties of the data. Some numerical analysis may also require 
strong solutions. We will now introduce a set of conditions which are easier to 
verify than (3.49), which also qualify the type of solution. The first condition is 
called global Lipschitz condition. 

Assumption 1 For all i,yeM and t e [0, T ], there exists a constant K < +oo 
such that 


I b(t, x ) - b{t, y)| + | ait, x) - ait, y)\ < K\x - y\. (3.50) 

Next is the linear growth condition, which implies that the solution X, does not 
explode in a finite time. 

Assumption 2 For all x,y e ffi and t e [0, T ], there exists a constant C < +oo 
such that 


\bit,x)\ + \ait,x)\ < C(1 + |*|). (3.51) 

Theorem 5.2.1 in 0ksendal (1998) states that under conditions (3.50) and (3.51), 
the stochastic differential equation (3.48) has a unique, continuous, and adapted 
strong solution such that E ^/ Q 3 jX,| 2 dr^ < oo. 

Exercise 3.10 Write the stochastic differential equation of the process Y, — Bf. 

Exercise 3.11 Write the stochastic differential equation of the process Y, — 2 + 
t + e B >. 

Exercise 3.12 Write the stochastic differential equation of the process U t — 
f' e-W~ s) dB s , with X>0. 

Exercise 3.13 Let X be a solution to dX t — X,dt + adB t and let Y, — fiX t ), 
with fix) — sin(x). Write the stochastic differential equation for Y t . 

Exercise 3.14 Find the solution of the following stochastic differential equation: 
dX, = bit)X t dt + ait)X t dB t . Assume that all necessary conditions on bit) and 
ait) are fulfilled. 

Exercise 3.15 Show that X t = jf~ t is a solution to dX, = — -j-j-y X,dt + j^jdB, 
with Xq — 0. 

Exercise 3.16 Prove that X, = (a 1 / 3 + \ B,y is a solution to dX t — ^xj^dt + 
X r 2/3 d B„ X 0 = a> 0. 

Exercise 3.17 Write the stochastic differential equation for Y, = -j-pg-. with 
Y 0 = 1. 


130 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


Exercise 3.18 Write the stochastic differential equation for X t —t + {\—t) 

fo 

Exercise 3.19 Let X t be a stochastic process solution to dX t — b(X t )dt + 
o{X t )dB t and let Y, be the Lamperti transform of X,, i.e. Y t — f(X t ) with 
f{x) — f* ^d_d u and xq any real number in the support of X,. Assume that 
cr(-) is twice differentiable and derive the stochastic differential equation for Y t . 


3.16 Girsanov’s theorem for diffusion processes 

Girsanov’s theorem is a change-of-measure theorem for stochastic processes. 
This is a necessary tool to perform inference for stochastic processes because it 
gives as a result the likelihood function of the process but in hnance it is also 
a necessary tool to obtain equivalent martingale measures. As we will see in 
Chapter 6 the existence of an equivalent martingale measure is a necessary con¬ 
dition to avoid arbitrage in the market. Consider the three stochastic differential 
equations 

dX r = bi(X,)dt + o{X t )dW t , X { J\ 0 < t < T, 

dX r = b 2 (X,)dt + o{X t )dW u X (2) , 0 < t < T, 

dX t = o(X t )dW u X 0 , 0 <t<T, 

and denote by Pi, P 2 , and P the probability measures of X, under the three 
models. 


Theorem 3.16.1 (Lipster and Shiryayev (1977)) Assume that Assumptions 3.50 
and 3.51 are satisfied. Assume further that the initial values are either random 
variables with densities f\f), f 2 (f), and /(■) with the same common support 
or nonrandom and equal to the same constant. Then the three measures P\, 
P 2 , and P are all equivalent and the corresponding Radon-Nikodym derivatives 
(Zt, Z' t ) are 


and 


d P\ (v . fi(Xo) 

-(X) —-exp 

dP f(Xo) 


dP 2 


b i(X,) 

o ct 2 (A s ) 


dX, 


Z' T = — -(X) 
T dPj V ; 


If^d, 

2 Jo oHXs) 


(3.52) 


hiX a ) 

Mx 0 ) 



b 2 (X s ) - b x (X s ) 
o 2 {X s ) 


dZ s 


1 [ T b 2 (X s ) - b%(X s ) - 

2 J 0 cr 2 (X s ) " 

(3.53) 
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3.17 Local martingales and semimartingales 

Definition 3.17.1 A process {X t ,t > 0} adapted to a filtration T is called 
local martingale if there exists a sequence of stopping times t*. : S2 —> [0, +oc) 
such that 

(i) the sequence r* is almost surely increasing, i.e. P{tk < Tt+i) = 1/ 

(ii) the sequence xk diverges almost surely, i.e. P(Xk -> oo, k —> oo) = 1; 

(iii) the stopped process 


1{*>0}X? 


Ijrj. >0)^IAr^ 


is a martingale for every k. 


The sequence of stopping times in the definition of local martingale is usually 
called localizing sequence. 


Example 3.17.2 Let T — minjf : B t — — 1}. We know that T is a stopping time. 
Consider the stopped process Bj — B tA j. Then E( Bf) = 0 for all t. It is also 
straightforward to prove that it is a martingale. However, the following rescaled 
process 

\B T , , 0< t < 1 

X t = | ^ 

-1, f>l 


is such that 


MX,) = 



0 < t < 1 
t > 1 


so it is clearly not a martingale. Nevertheless, it is a local martingale with respect 
to the sequence — min{f : X, — k] although we do not give the proof here. 

In general, local martingales are not martingales and local martingales arise often 
in the definition of the stochastic integral with respect to martingales. It turns out 
that such a stochastic integral is not a martingale but just a local martingale. 

Definition 3.17.3 A real-valued stochastic process {X,, t > 0J, adapted to 
a filtration {IF,,t > 0}, is called semimartingale if it can be decomposed as 
follows: 

X, — M t + A t 


where {M t ,t > 0} is a local martingale and {A t ,t > 0} is a ccidlcig adapted 
process of locally bounded variation. 


132 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


Most of the processes discussed in this book are semimartingales and 
semimartingales are the wider class of stochastic processes for which the 
stochastic integral can be defined. We will use the notion of local martingale 
and semimartingale whenever needed without going into deep details. A good 
reference on the topic is Protter (2004). 

3.18 Levy processes 

Levy processes were introduced as the sum of a compound and compensated 
Poisson process and a Brownian motion with drift. The original idea was to 
construct a family of processes wide enough to comprise a variety of well-known 
other stochastic processes and more. Most relevant references are Levy (1954), 
Bertoin (1998) and Sato (1999) but many new publications appeared recently due 
to applications of these models in finance. We start with a gentle introduction 
before giving a formal definition. Assume we have a two processes X, and M, 

X, — fit + aB t , fi e R, a > 0, 

where B, is a Brownian motion and 

N, 

M t = J2 Y ri-^tE(Y Ti ), k > 0, 

i=0 

with N t an homogenous Poisson process and Y z . a sequence of i.i.d. random 
variables such that E(F T; ) < oo. We assume that N t , W, and the Y Zi ’s are all 
independent. We introduce the process Z f , which we can think of as an embryo 
of a Levy process, 


Z, = X, + M t . 


(3.54) 


Remember that M, is a martingale by construction and if /i = 0 then also X, is a 
martingale due to the fact that Brownian motion is itself a martingale. Therefore, 
we immediately see that also Z, is a martingale. For Z, in (3.54) we can also 
easily derive the characteristic function due to independence of X, and M,. 
Indeed, 


<Px,(u) = exp 


1 2 ^ 

itiu - a u~ 

2 


and 


cp Mt {u) — exp I kt 


/ OO 

0 e iu 

-OO 


1 — iux)dF(x) 


where F(-) is the common distribution function of the markers Y Tj . Then, we 
obtain the so-called de Finetti characteristic function for Z, 


1 


(pz,{u) = exp •{ t ( ijiu — 2 ct2h2 + ^ 


/ OO 

(e iux 

-OO 


1 — iux)dF(x) 
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This is nice but not enough to describe the general framework of Levy processes. 
Assume that we add another independent Brownian motion to Z f , then, the 
resulting characteristic function does not change too much because the sum of 
independent Gaussian random variables is a new Gaussian random variable, so 
< pz,(u ) will remain in essentially the same form. But things do change if we 
add another jump component. Assume further that there are two compound and 
compensated Poisson processes, say M / and My. Let M‘,i = 1,2 have intensity 
Xi and the corresponding markers a distribution F, (x ). Then, the characteristic 
function of M] + My will have an exponent like the following 


X\t 



1 — iux)dF\(x) + X 2 1 



— 1 — iux)dF2(x) 



1 — iux)v(dx ) 


with v(dx) = /.id/ 7 ! (x) + iGdT^x). The measure v(-) plays a central role in the 
construction of the Levy process and it will be indeed called the Levy measure. 
The idea of the Levy measure is that it contains all the information about the 
jumps of the process Z, and so in general one wants to model directly the measure 
u( ) instead of the various F,(-). Due to the fact that jumps can be negative or 
positive but reasonably not null, the measure v( ) will have a discontinuity at 0. 
Moreover, while being surely positive, for any two numbers Xi > 0 and X 2 > 0, 
the measure v(-) does not necessarily integrate to 1. So the general construction 
of Levy processes requires proper restrictions to the general form of v( ). We 
now give the general debnition of a Levy process. 

Definition 3.18.1 A cadlag, adapted, real valued stochastic process { Z,, 0 < t < 
T }, with Zq — 0 almost surely, is called a Levy process if the following properties 
are satisfied 


(i) Z t has independent increments, i.e. Z t — Z s is independent of T s , for any 
0 < s < t < T; 


(ii) Z, has stationary increments, i.e. for any 0 < s, t < T the distribution of 
Z t+S — Z, does not depend on t; 


(iii) Z t is stochastically continuous, i.e. for every 0 < t < T and 6 > 0, we 
have that 


lim P(|Z, - Z s \>€) = 0. 

^—yt 


It is important to notice that property (iii) is not related to the continuity of the 
paths (indeed, it is a cadlag process) but only to the property of the distributions 
of the increments. The simplest Levy process is the deterministic linear drift 
process, i.e. for some constant p,, Z, = pt, which has continuous paths and 
the only nondeterministic Levy process with continuous paths is the Brownian 
motion. The Poisson and compound Poisson processes are also Levy process and 
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of course the process Z, in (3.54) is also a Levy process. The process in (3.54) is 
sometimes called Levy jump diffusion to distinguish it from other jump diffusion 
processes which are not of Levy type. 

3.18.1 Levy-Khintchine formula 

If we look at the characteristic function of the random variable Z, in (3.54) for 
Levy jump diffusion 

{pz,(u ) = exp jf ^ifiu — —o 2 u 2 + L J ( e lux — 1 — iux)dF(x) 

and we compare it with the Levy-Khintchine formula (2.16), we see that for this 
process the characteristic triplet is 

(b — jxt, c = cr 2 t, v = (LF) ■ t) 

where b is called the drift term, c the Gaussian or diffusion term and v is the 
Levy measure. This means that the marginal law of the simple jump diffusion 
process Z, is infinitely divisible. This fact is actually true for any Levy process. 
We can indeed rewrite a general Levy process Z, of Definition 3.18.1 using 
increments over a regular grid of time points of size t/n for some integer n, i.e. 
we can write the following telescopic sum of increments 

Z t = Z t_ + (Zit — Zt^] + ■ ■ ■ + (Zt — Z . 

n \ n n J \ n / 

By stationarity and independence of the increments f = [Zh — Z«-i), L 
k= 1,..., n, is a sequence of i.i.d. random variables, so we can readily see that 
Z t is, in the general case, also infinitely divisible. 

Theorem 3.18.2 Let Z ; be any Levy process and let \jj (u ) be the characteristic 
exponent of the random variable Z \. Then 

E {e iuZ <} = e^ (H) = exp J t (ibu - ^ + J ( e iux - 1 - iux)l {M<l] v(dx) 

Proof We present some hints of the proof, because it is instructive. The first 
fact that we notice is the following decomposition of the characteristic function 
of Zf-f- s 

<Pz t+s (u) = E } = E J e '“( z »+i~ z D e '«z s j _ independence) 

= E {e luZ,+s } ■ E { e ‘ u/s } — (by stationarity) 

= E{<?''" Z '} ■ E {e'“ z '} 

= <Pz t (u)(p Zs (u). 
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Let us denote cpz,(u) by <p, (u). Then, we have obtained the so-called Cauchy 
functional equation 

<Pt+s(u ) = <p,(u)<p s (u ) 

and we know that <p t { 0) = 1 for all t > 0 and that, as a function of t, the 
characteristic function of the Levy process is continuous due to the property 
of stochastic continuity. The only continuous solution to this Cauchy functional 
equation is the function of the form: 

cp t (u) — e' g(u) , with g{u) : ffi C. 

Since Z\ is an infinitely divisible distribution, the statement of the theorem holds 
with g(u) = i fr(u). 

Theorem (3.18.2) is rather important because it says that the law of Z, can 
be obtained from the law of Z\, but because the law of Z\ is infinitely divisible, 
so is the law of Z f . But this also means that, for a given random variable X with 
an infinitely divisible distribution we can always build a Levy process with that 
distribution simply setting Z\ ~ X. 

3.18.2 Levy jumps and random measures 

Let AZ = (AZ,, 0 < t < T] be the jump process associated with the Levy 
process Z, defined for each t as 

A Z f — Zt — Z f , 

where Z t - — lim v _>, Z s , is the limit from the left. By the stochastic continuity 
of the Levy process, if we consider a fixed t, then AZ, = 0 almost surely. This 
means that a Levy process has no fixed times of discontinuity, but nevertheless 
it has jumps and usually the total sum of jumps may even diverge, i.e. 

Y, AZ v = oo a.s. 

s<t 


but we always have that 


| AZ. v | 2 < oo a.s. 

s<t 

which makes the Levy process treatable as a martingale. Consider now a set 
A e B(M\{0}) and such that 0 <£ A and let 0 < t < T. We define the following 
measure of the jumps of the Levy process Z as follows: 

A l z (co; t, A) = #{0 <s <t; A L s (co) e A] = ^ 1 a (AL s (cj)). 

S<t 
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The measure /r z is a random measure which counts the jumps of the process Z 
of size A up to time t. The random measure has several properties, e.g. 

/x z (f, A) - n z (s, A) e er({Z„ - Z„; s <v <u <t}) 

and hence pt z {t, A) — p z (s, A) is independent of T s and has independent incre¬ 
ments. Further, ii z (t, A) — ji z (s. A) is the number of jumps of Z s+U — Z s in 
A for 0 < n < f — s. Due to the stationarity of the increments of Z, we have 
stationarity of the increments of /i z , i.e. 

p z (t, A) — fi z (s, A) ~ p z (t — s, A). 

From the above we obtain that /x z (-, A) is a Poisson process and /i z is then 
called the Poisson random measure. The intensity of this Poisson process is 
v(A) = E{/r z (l, A)}. 

Definition 3.18.3 The measure v defined by 


v(A) = E{/r z (l, A)} = E 


^l A (AZ s (m)) 

5<1 


is the Levy measure of the Levy process Z. 

Notice that, if / : K -> M is a Borel measurable function and finite on A, it is 
possible to define the integral with respect to /r z as follows: 

f /(x)M Z (w; t, dx) = Y, f(AZ s )l A (AZ s (co)). 

J A s<t 

Moreover, f A f(x)p. z (t,dx) defines a real-valued random variable for each t 
and hence we can construct a cadlag stochastic process as follows: 

G, — f [ f{x)pt z (ds, dx ), 0 <t <T. (3.55) 

J0 J A 

The next theorem (for a proof see Applebaum (2004), Th. 2.3.8) presents the 
main properties of the process G t . 

Theorem 3.18.4 Consider a set A e B(R\(0[) with 0 ^ A and let f : M —> R, a 
Borel measurable function and finite on A. Then, the stocastic process G t defined 
in (3.55) is a compound Poisson process such that 



£ iuf(x) _ 


(i) its characteristic function is given by 
(PG,(u) = E {e'“ G '} — exp | 


1) v(djt) 
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(ii) if f e L l (A), then 


E{G,} = t J f(x)v(dx); 


(iii) is f e L“(A), then 


Var{|G,|) 


= </ 

J A 


\f(x)\ 2 v(dx). 


3.18.3 Ito-Levy decomposition of a Levy process 

Consider the following decomposition of the characteristic exponent of a Levy 
process: 

\/r(u) — ^ibu --—f J ( e‘ ux — 1 — iux)l[\ x \ < \}v(dx)^ 


4> (l) (u) + 0 u; (n) + <j> K3> (u) + 0 w (n), 


ri 2 ) 


C3)/ 


S<4), 


with 


<f> (l) (u) = iub, (j> w (u) 


(2) ( 


2 

U C 
~ 2 ' 


0 (3) (m)= f (e iux - 1) v(tU), 
J W>i 

= f ( e lux — 1 — iux) 

J Lv I < 1 


0 (4) (n) 


v(d.v). 


The first term corresponds to a deterministic linear process, say Z ) 1} ; the second 
Z.] 1 1 to a Brownian motion rescaled by *fc and the third tenn Z, (3) corresponds 
to a compound Poisson process with X = v(R\(—1, 1)) and distribution of the 
markers (the jumps) F(dx) — ^^l{p|>i}. The last term Z\ 4) is more difficult to 
describe but, intuitively, it is associated with a jump process, say z; 4 - €) , e > o, 
defined as 


Z r (4 e) = f [ 

Jo Je< IjcI< 1 


xfi z (dx, d.v) — til xv{dx) ) , 

|*|<1 \Jl>\x\>€ 


with characteristic function 


(4 ' e) (n) = f (e lux — 1 — iux) v(d.r). 
Je<\x\<l 


V 


Then 0 (4) (n) is the characteristic function of the process Z, ,4) = lim f _ >0 1 Z l4 ' f) . 
The next result states more precisely what we have shown in the above and 
it is known as the Ito-Levy decomposition. The proof can be found, e.g., in 
Sato (1999). 


138 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


Theorem 3.18.5 Consider the triplet ( b , c, v) where h e M, c > 0, and v a 
measure such that v({0}) = 0 and / R (l A |x| 2 )v(d.r) < oo. Then, there exist, on 
some probability space, four independent stochastic processes Z^\ Z i2 \ Z (3) 
and Z (4) , where Z (1) is a constant drift process; Z (2) is a Brownian motion; 
Z <3) is a compound Poisson process and Z ,4) is a square integrable pure jump 
martingale with, almost surely, a countable number of jumps of magnitude 
less than 1 on each finite time interval. Moreover, the process defined as 
z = z (1) + z (2) + z ,3) + z (4) is a Levy process with characteristic exponent 

u 2 c f 

(p{u) — iub -P / ( e lux — 1 — iuxl\ x \<i) v(dx), u e K. 

2 Jr 

Thanks to the Ito-Levy decomposition, we can always rewrite a Levy process 
as follows: 

Z t — bt + \fcB t + f [ xp. z (ds, dr) + f f x(/r z — v z )(d.s, d.r) 

Jo J\x |>1 JO J\x\<\ 

= Z (1) + z (2) + z (3) + z (4) , 

where v z (d.v, dr) = v(d.r)ds'. 

3.18.4 More on the Levy measure 

The basic requirements for the Levy measure v in the triplet (b. c, v ) are as 
follows: 

i>({0}) = 0 and /(1 a |x| 2 )v(d.r) < oo. 

JR 

Roughly speaking, the Levy measure describes the expected number of jumps 
of a certain amplitude in a time interval of length 1. The Levy measure has no 
mass at the origin, but many jumps may occur around the origin, i.e. there may 
be a big quantity of very small jumps. It is also true that jumps which are far 
away from the origin have bounded mass. If v is a finite measure, we have seen 
that, defining X = u(M) < oo, F(dx) — v(dx)/X is a probability measure and X 
is just the expected number of jumps with F (■) the distribution of the jumps. But 
if v(M) = oo, then an infinite number of small jumps is expected. In this second 
case, the Levy process is said to have infinite activity. The next sets of results 
relate the Levy measure with other aspects of the paths of the Levy process. All 
the proofs are omitted but can be found in Sato (1999). 

Theorem 3.18.6 Let Z be a Levy process with triplet ( b , c, v). 

(i) if v (M) < oo, then almost all paths of Z have a finite number of jumps on 
every compact interval. In this case, the Levy process has finite activity; 

(ii) ifv(R) — oo, then almost all paths of Z have an infinite number of jumps 
on every compact interval. In this case, the Levy process has infinite 
activity. 


STOCHASTIC PROCESSES 


139 


Theorem 3.18.7 Let Z be a Levy process with triplet ( b , c, v). 

(i) if c — 0 and /j v |<] |jt|v(dr) < oo, then almost all paths of Z have finite 
variation; 

(ii) if c 0 and /j v |<] | Jc | v(dr) = oo, then almost all paths of Z have infinite 


variation. 


The above results say that, when the Brownian part is missing and the sum 
of small jumps does not diverge, then the process Z has Unite variation and 
viceversa. The final result concerns the finiteness of the moments of the Levy 
process. The knowledge of this property will be particularly relevant in financial 
applications as we will see. 

Theorem 3.18.8 Let Z be a Levy process with triplet (h, c, v). 

(i) Z t has finite p-th moment for p > 0, i.e. E\Z t \ p < oo, if and only if 
f\ x |>i MMdx) < oo; 

(ii) Z, has finite p-th exponential moment for p > 0, i.e. E(e f,z ') < oo, if 
and only if/j Y | >1 e px v(dx) < oo. 

To summarize the above results, the small jumps (and the Brownian part) 
determine the variation of the process; the big jumps affect the existence of the 
higher moments of the process; the activity depends on the whole set of jumps of 
the process. 

3.18.5 The Ito formula for Levy processes 

It is possible to derive the It6 formula for Levy processes. This formula is very 
useful when modeling financial data through Levy processes and in particular for 
the exponential Levy model. Next results is stated without proof but reader can, 
for example, check Proposition 8.15 in Cont and Tankov (2004). 

Theorem 3.18.9 (Ito Formula) Let Z, be a Levy process with Levy triplet 
( b , c, v) and f : K —>• M a measurable and twice differentiable function. Then 


f (Zt) = /(Z 0 ) + C - f f"(Z s )As + f f(X s -)dZ s 
^ Jo Jo 

+ {/(Z,-+AZ,)-/(Z J -)-AZ,/'(Z,-)}. 


(3.56) 


0<s<t.AZ s ^=0 


Example 3.18.10 Let Z, be a Levy process with Levy triplet ( b , c, v) with 
Ito-Levy decomposition 
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and define S, — Soe Zt , with So some constant. Then S t satisfies the stochastic 
differential equation 

dS, = S t - I dZ, + -At f (e x — 1 — x) /J. z (dt, dx) 

{ 2 J R 


Indeed, 
S, 


= S 0 + L - [ e Zs ds + f e z ° dZ s + 

0<i<f,AZ,40 

So + ^ f e z ’ds+ f eVd Z s + Y 

z j 0 JO 


-\-AZ s _ „Z — 


n Z v - 


e z °~ \ e AZ * _ 1 _ A Z, 


0<s<«,AZ s ^O 


= So + J / 5,dA / S, dZ, + V S ; 

0<i<(,AZ s ^0 

f r** + f s,-dz,+f f 

z Jo Jo Jo Jk 


_AZ, 


1 - AZ, 


— *So + 


and, in differential form, 


S s -dZ s + / / S s -(e - 1 - x)^ (ds, dx) 


7 ^ 


dS, = -S,df + 5,-dZ, + / S,-(e - 1 - x)/x^(dr, dx) 


7 ,+ f 

Jr 


= 5 f - { —dr + dZ, + / (e* — 1 — x)/z z (d/, dx) 


3.18.6 Levy processes and martingales 

Next theorem applies to the wide class of one-dimensional stochastic processes 
with independent increments and will be used in option pricing to study martin¬ 
gale properties of, e.g., discounted price processes. 

Theorem 3.18.11 Let { X,, t > 0} be a real-values stochastic process with inde¬ 
pendent increments. Then 


(i) the process 


e iuX > 

E ( e iuX >) ’ 


t > 0, 


is a martingale for all u € K; 

(ii) if, for some uel, K(e llX ') < oo and for all l > 0, then 


uX, 

M, = - 

' E (e“ x ‘)’ 


t > 0, 


is a martingale (see e.g. Exercise 3.2); 
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(Hi) ifK(X r ) < oo for all t > 0, then 


M t — X, — E(X,), t > 0, 

is a martingale with independent increments; 

(iv) if Var(X t ) < oo for all t > 0, then 

Q, = M} - E (Mf), t > 0, 

is a martingale, with M, — X t — E(Z,) (see e.g. Exercise 3.5). 

Notice that, thanks to Theorem 3.18.8, for the Levy processes all the above 
statements are true if the corresponding moments of the Levy measure are finite 
for at least one t. A proof can be found in Sato (1999). 

Theorem 3.18.12 Consider again a Levy process with Levy triplet ( b , c, v) and 
Itd-Levy decomposition 

Z, — bt + \fcB r + f f x(p. z — v z )(d s, dr) 

Jo Js. 

such that E| 1 1 < oo. Then Z, is a martingale if and only if b — 0. 

Proof. The process Z,, according to Definition 3.17.3 can be seen as a 
semimartingale of the form: 

Z t — M t + A t 


where 


A t = 



x(p. z — v z )(d.s, d.v) 


is the cadlag process. Moreover, given that 



xp z (ds, dx) 


E AZ - 

0 <s<t 


we also have that 


E 



xp. z (ds, dx) 


— / xv z (d^,dx)= / / xv(dx)dt — t I xu(dx). 

Jo Jr Jo Jr Jr 


Thus, E{Z,} = bt and it turns out that such a Levy process is a martingale if and 
only if b = 0. 

We now introduce an important tool which is needed to operate the change of 
measure in the case of Levy processes. This theorem is again the Girsanov’s 
theorem and it makes use of the Radon-Nikodym derivative introduced 
in Definition 2.3.5. We omit all the proofs which can be found in, e.g. 
Papapantoleon (2008). 
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Let P and P be two measures on (£2, A) and let T — [JF t , 0 < t < T] be a 
filtration on this probability space. If P and P are equivalent, then there exists 
a unique, positive martingale {M r , 0 < t < T] with respect to P such that M t — 

E 


T, J. Conversely, given P and a positive martingale M t with respect to 
P , one can define an equivalent measure P using the Radon-Nikodym derivative 


as follows M t = E j 


d ~7 


Theorem 3.18.13 (Jacod and Shiryaev (2003)) Let {Z,, 0 < t < T] be a Levy 
process with characteristic triplet (b , c, v) under the measure P, with finite first 
moment and canonical decomposition 

Z, — bt + sfcwr + f ( x (pt z — v z ) (d^, dr). 

Jo Jm 

(i) assume that P and P are two equivalent measures. Then, there exist 
a deterministic process j5 and a measurable non-negative deterministic 
process Y such that, almost surely under P, we have 


f( 


|r(F(i, x) — l)|v(dx) < oo 


and 


f 


(cpf)ds < oo. 


(ii) conversely , if {M t , 0 < t < T } is a positive martingale of the form: 
M, = cxpj J fi sS fcAW s 

~ \ / fids + [ [ ( Y(s , x) - 1) (/x z - u z ) (d.s, dx) 
^ Jo Jo Jr 


-f[ 


( Y(s, x) — 1 — log(F (s , x)))/r z (di, dx) 


then it defines a probability measure P equivalent to P. 

(iii) in both cases W t — W t — fl *Jc/3 s ds is a Brownian motion with respect 
to P, v z (ds, dx) = Y(s, x)v z (ds, dx) is the compensator of ji z under P 
and Z has the following canonical representation under P: 

Z, — bt + sfcW t + f f x (/r z — u z ) (di, dx), 

Jo Jr 


bt — bt + ( cfi s ds + f f 
Jo Jo Jr 


x (F(r, x) — 1) v z {ds, dx). 


with 
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3.18.7 Stochastic differential equations with jumps 

Consider the compensated Poisson random measure 


/i(df, dz) = ptfdt, dz) — v(dz)dt 


where v(-) is a Levy measure 



Jr 


Let T t be the a -algebra generated by the Brownian motion B s and fids, dz) for 
zeM and s < t, and enlarged by all the sets of P- zero probability. Consider the 
bltration T — \T t , t > 0}. 

Definition 3.18.14 (Ito-Levy process) Let aft), bft) and eft, z ) predictable 
process, for all t > 0, z e K, such that 



di < oo a.s. (3.57) 


The stochastic process {X(t), t > 0), Xq — x, admitting the stochastic integral 
representation 



is called Ito-Levy process. The stochastic differential equation for X, is written 
as follows: 


dX, — a(t)dt + b(t)dB, + / c(t, z)fi(dt, dz), Xq — x. 


M 


Under condition (3.57) the stochastic integrals are well-defined and local mar¬ 
tingales. If we replace condition (3.57) with the next one 



the stochastic integrals become martingales. 

Example 3.18.15 In applications we often take aft) — a(X t ), b(t) — b{X t ) and 
eft , z) = c(X t ~, z) and write the stochastic differential equation as follows: 


dX, = a{X,)dt + b{X,)dW, + / c{X t -,z)n{dt,dz) 


Id > i 



+ 


c(X f _, z){/x(dL dz) - v(dz)d?}, 
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where p, is the random measure associated with jumps of X, 

R(d t, dz) = ^2 l{Az s ^o)<5(i,Az s )(dL dz), 

s > 0 


and 8 denotes the Dirac measure. The process Z, is the driving pure-jump Levy 
process of the form: 


Z, 


= fj 

Jo J|r|<i 


z{pi ds, dz) — u(dz)di} + 


/'/ 

Jo J\z\>\ 


zpids, dz); 


3.18.8 ltd formula for Levy driven stochastic differential 
equations 

Theorem 3.18.16 Let {X t , t > 0} be an Ito-Levy process as in (3.58) and let 
f : (0, oo) x I —> I be a continuous measurable function with first derivative 
with respect to the first argument and twice differentiable with respect to the 
second argument. Set Y, — fit. Xf), t > 0. Then Y t is also an Ito-Levy process 
which satisfies 

9 9 9 

dr, = X,)dt + — fit, X,)ait)dt + —fit, X t )bit)dB, 

at ox ox 

1 9 2 , 

+ 2dx> f(t,Xt)b im 

+ jT (/(?, X, + c(t, z)) - fit, X t ) - -^fit, X t )cit, z)) v(dz)df 

+ / (/ it, Xt- + c it, z)) — fit, X,-)) fiidt, dr). 

Jr 


Next example is taken from Di Nunno et al. (2009). 

Example 3.18.17 (Generalized geometric Levy process) Consider the stochas¬ 
tic differential equation for the cddlcig process Z solution to 


dZ, = Z, 


|a(I)d/ + b(t)dB t + 


/ 

Jr 


cit, z)fiidt, dz) , 


Zq = zo > 0, 


where aft), bft) and cft,z)>— 1 are given predictable processes satisfying 
(3.57). We now prove that the solution of the above stochastic differential 
equation is 


Z, = zoe Xl , t > 0, 
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where 


Xt 


1 


/ 

Jr 


a(s) - -b~(s) + I (log(l + c(i, z)) - c(.s, z)) v(dz) 


di 


f b(s)dB s + [ [ 
Jo Jo Jr 


+ / b(s)dB s + I I log(l + c(s, z))f(ds, dz). 

We use ltd formula for Y, — fit, X,) with f(t, x ) = zoe*. 

dF, = Zoe x ‘ j|a(T) - ^ b 2 {t)+ J (log(l +c(t, z)) - c(t, z))v(dz)jdt + b(t)dB, 

+ zoe x '^b 2 (t)dt 

+ \ zo(e x,+losa+c(t ’ z)) -e x ‘-e x ‘log(l + c(t,z)))v(dz)dt 

Jr 

+ [ z 0 (gV+kgfl+cCf,*)) - e x ‘~) A(d t, dz) 

Jr 

= Y,- ^a(f)df + b{t)dB, + J c(t, z)A(dl, dz)^ 

which is the required stochastic differential equation. 


3.19 Stochastic differential equations in R" 

We briefly describe the multidimensional version of stochastic differential 
equations and the Ito formula in the multidimensional case. We first start with 
the definition of the multidimensional Brownian motion. 

Definition 3.19.1 (multidimensional Brownian motion) Let {B\(t),t > 0), 
{.62(f), t > 0}, .... {B m (t),t> 0} be m independent Brownian motions. The 
vector 

B, = (Bft),B 2 {t),...,B m {t)), t >0, 
is called m-dimensional Brownian motion. 

The multidimensional Brownian motion is such that 

B, — B, ~ N(0, 1 (t — s)) 

where / is the m x m identity matrix, 0 is the zero vector of W ", and Nf, ■) is 
the multivariate normal distribution. 
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Definition 3.19.2 A ltd diffusion is a stochastic process [X,.0 < t < T) 
with X r (a >) : [0, T] x £2 —>• M" satisfying a stochastic differential equation of 
the form: 


dX t = b(t,X t )dt + a(t,X t )dB t , t > 0, X 0 = x, (3.60) 

where {B t , t > 0} is an m-dimensional Brownian motion, b : [0, T ] x K” —»• R", 
o' : [0, T ] x dR n -* M" xm such that 

I bit, x) - b(t, y)| + | cr(x) - o-(_y)| < L\x - y\, x, y e R", t e [0, T], 

where \a —^Wij\ 2 and 

\b(t, x)\ + | o(t, x)\ < L( I + | jc |), x e R", t e [0, T], 


If there exists a random variable Z independent of {B,, t > 0} such that 
E|Z| 2 < oo, then the stochastic differential equation has also a unique solution 
and it is such that 

t rT 


E 


\X,\ 2 dt 


< oo. 


The stochastic differential equation (3.60) has to be understood in matrix 
form with 



/ Xft) \ 


/ bi(t,X t ) \ 

X, = 


, b(t, X t ) = 



v X n {t) , 


^ bn (t •> Xf') j 


o n (t, X,) 


a — 


Cfnl (t, x,) 


0\m(t, X ,) 
®nm (t ■ X [) 


More precisely, 

m 

dX,(0 = bdt, x,)dt + £ o\j(t, X t )dBj(t) 

7=i 
m 

d x 2 (t) = b 2 (t, X t )dt + £ cr 2j (t, X t )dBj(t) 

7 = 1 


m 

dX n (t) = bn(f, X,)dt + £ Onj(f, X t )dBj(t) 

7=1 

Theorem 3.19.3 (Ito formula) Let B t be a m-dimensional Brownian motion and 
X(t) and n-dimensional ltd process solution to a stochastic differential equation 
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like (3.60). Let f : [0, T] one time differentiable in t and twice with 

respect to the second argument. Then 


df(t, X,) = -f(t, X t )dl 
at 


A d 1 A 9 2 

+ E !T f(t ’ X ^ dX '^ + T E a- 

oxi 2 . dXiXj 

i=l,j J 


f(t,X t )dXi(t)dXj(t) 


i =1 


with the usual rules d B, cl B , = d ;/ d/, d Bj dl = d/d /?, = 0, where 8ij — I only if 
i — j and 0 otherwise. 


3.20 Markov switching diffusions 

Let a t be a finite-state Markov chain in continuous time with state space S and 
generator Q = [qtj] (see Definition 3.4.17), i.e. the entries of Q are such that 
qij > 0 for i / j, = ® ^ or eac ^ ' € Suppose that /(-,-) : M r xS-> 

K" and g(-, ■) : I' 1 x 5 -> M nx ”. Consider the so called n-dimensional hybrid 
diffusion system or stochastic differential equation with Markov switching 


dX, = f(X,,a t )dt + g(X t ,a t )dB t , X 0 = x 0 , a(0) = a, (3.61) 

where {B f , t > 0} is a n -dimensional Brownian motion. The initial value x, the 
Brownian motion B and the Markov chain a t are all mutually independent. The 
functions /(■, ■) and g(-, ■) satisfy certain regularity conditions which can be 
found, for example, in Mao and Yuan (2006) but usual Lipschitz and growth 
conditions are enough. In this section we assume that if, for each state i e S the 
functions /(-,/) and g(-, i) satisfy the usual Lipschitz and growth conditions so 
that (3.61) has a unique solution in distribution for each initial condition (see, 
e.g. Yin et al. (2005)) as stated by the next result. 

Theorem 3.20.1 (Mao 1999) Let f and g be globally Lipschitz continuous, i.e., 
there exists a constant L > 0 such that 

min (| f(u, j ) - f{v, j )| 2 , | g(u, j ) - g(v, j)| 2 ) 

< L\u — v\ 2 , Vn, v, e R”, j e S. 

Then, (3.61) has a unique solution for any given initial value Xq = xq e K" and 
oiq — i e S. Moreover, the joint process 


Z, = ( X t , a t ), t > 0, 
is a time-homogeneous Markov process. 

For the switching diffusion model, there exist an associated operator defined as 
Ch(x , i ) = -tr[V 2 /i(x, i)g(x , i)g(x, i)'\ + Xh(x, if f(x, i ) + Qh(x, ■)(i) 
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where V/; and V 2 /; denote the gradient and the Hessian of h and trA is the trace 
of matrix A and 


Qh(x, -)(i) = 'Y^qijhix, j ) = ^q^ihix, j ) - h(x, i )). 


r'e-S 


Associated with (3.61) there is a martingale problem formulation. 

Definition 3.20.2 A process (X ,. a t ), t > 0, is said to be a solution of the mar¬ 
tingale problem operator C if 



is a martingale for any function h : ffi” x S —»• M, such that for each i e S, the 
function /?(-, i) is twice continuously differentiable with respect to the first argu¬ 
ment and with compact support. 

Under the conditions of Theorem 3.20.1, it is possible to prove that (3.61) has 
a unique solution associated with the martingale problem. For these Markov 
switching systems one interesting field of investigation is the asymptotic stability 
and the existence of an invariant measure. 

Definition 3.20.3 The process Z, — (X t , a,) is said to be asymptotically stable in 
distribution if there exists a probability measure n(- x ■) on M" x S such that the 
transition probability pit , y, i, dx x {y}) of X t converges weakly to Ti(dx x {j}) 
as t —> oc for every (y, i ) e M" x S. The corresponding stochastic differential 
equation (3.61) is cdso said to be asymptotically stable in distribution. 

When a switching diffusion is asymptotically stable in distribution then X, has 
a unique invariant probability measure. Under the conditions of Theorem 3.20.1 
and additional tightness conditions, it is possible to prove that system (3.61) 
is asymptotically stable in distributions. In the following, we will always work 
under these conditions unless explicitly mentioned. 

3.21 Solution to exercises 

Solution 3.1 (to Exercise 3.1) By using Jensen’s inequality (2.21) for the convex 
function f(x) = \x\ we get 


E{Z„|JF„_ 1 } = E{|^||JF„_ 1 }> |E{5 n |J-„_i}| = 14,-11 = Z„_ 1 . 


where in the last passage we used the fact that the random walk S n is a 
martingale. 
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Solution 3.2 (Exercise 3.2) Clearly, Z„ is IF n -measurable. Further, 

E{Z II |^„_ 1 } = E {M(ar n e aS ''\T n ^} = M(a)-' ! E {e aS ^\ ,F n _i} 

= M(q!)“"E {c“ (S "- 1+x " ) | Jf,,.,} = M(a)- n e aSn - 1 E {<?“*” | T n -\ } 

= M(o;)“"e“ 5 "- 1 E(e“ x ") = M( a)~ n+1 e aSn - 1 = Z„_i. 

VPe ueec/ to prove that E|Z„| < oo. Indeed, by using (2.9) in our context, we 
get that E(e°' s ") = M(a) n . Therefore, noticing that Z„ is non-negative, we have 
E|Z„| = M(o!)-' , E(e“ 5 ") = 1 < oo. 


Solution 3.3 (Exercise 3.3) Clearly Z„ is T n -measurable. Remember that 
|a — b\ < \a\ + \b\. Hence 


E|Z„| = E 



< E 



+ tier 2 


= E 


E Z ? + E Z '^ 

1=1 ¥1 


+ no 


2no z < oo. 


^E{X 2 } + ^E{X,Z,) + na 2 
i=l ¥.7 


Finally, 

K{Zn\Fn-l) 



I n \ 2 



/ n- 1 \ 2 

' 

E • 

ite*) 

^n-i 

• = E • 

|z„+^Z,j -«CT 2 

•Tvi-i 



n—1 

/»-! \ 2 

' 

E • 

Z 2 + 2Z„ £ Xi + 

E x ') - /!ct2 

Fn-\ 


i= 1 

\i=l / 



= E 


/ n —1 


n -1 


E Xi ) “ + X n + 2X n E X * - 

1=1 / 1 = 1 

n— 1 

= E{Z„_ 1 |JS,- 1 } + E {X 2 n \F n -x) + 2^Z,E{Z„|.F„_ 1 } - ct ; 


M—1 


1=1 

2 


= Z n _i + E {X 2 } + 2 Z/E {*„} - a 


i =1 


= Z„_! + o 2 - a 2 — 0. 
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Solution 3.4 (to Exercise 3.4) We need to rewrite inf and sup appropriately, then 
the properties of the filtration ensure the result. Indeed, 



Solution 3.5 (to Exercise 3.5) Clearly X, is T, -measurable and integrable. We 
can rewrite X, — B 2 — t — (B, — B s + B s ) 2 — t for s < t. Hence X, — (B, — 
B s ) 2 + B 2 - 2(B, - B S )B S - t and 

E{A,|^ S } = E{(5, - B s ) 2 + B 2 - 2(B, - B S )B S - t\F s ) 

= E(B, - fi,) 2 + B 2 - 2BMB, - B s ) - t 
— (t — s) + B 2 — t — B 2 - s. 

Solution 3.6 (to Exercise 3.6) The process X, is clearly IF,-measurable. 
Using calculations similar to that of Exercise 2.11, one can easily show that if 
X ~ N(0, a 2 ) then E(e x ) = e^ a and also that E|Z f | < oc. We first assume that 
X, is a martingale. Hence 

E(Z r ) = E (e^+' T ' B ') = e^E (e aB <) = e^'e^ ah = e ifJ -+^ 2)t . 

But, X, is a martingale, hence E(Z ; ) = E(Zs) = k, for all t and s, where k is 
some constant. Therefore, E(Z r ) should not depend on t and this is true only if 
p, — —\cr 2 . We now assume that p — —\cr 2 and show that X, is a martingale. 

E{Z f |JF s ) = e^'E [e aB ' | .7^} = e^ e° Bs E{e a(Bt ~ Bs) \j F,} ^ e^^E (e a(B, ~ Bs) ) 

= e Ut+aB Se \<r\t-s) _ e /it+<J B s +ii(t-s) _ e /J.s+crB s _ g 


Solution 3.7 (to Exercise 3.7) We need to identify an /(■) such that fix) — x 2 . 
This is clearly f(x) — |x 3 , hence we apply (3.44) for this /(■) and obtain 


3 


- [ B;dB, + ] - f 2B s ds 
Jo 2 Jo 


because f"{x) = 2x. Hence 


rt £ 3 pt 

/ B;dB s = B s ds. 

JO J> Jo 
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Solution 3.8 (to Exercise 3.8) In this case f x (t , x) — t hence 

f(t,x) = tx, f t (t,x)=x, f x (t,x) = t, f xx (t,x) = 0, 
then, using (3.45), we obtain 


therefore 


H ft j ,t 

V — 0 + / B s ds + / sdB s + — / 
Jo Jo 2 J 0 

f sdB s = tB t — I B s ds. 

Jo Jo 


Solution 3.9 (to Exercise 3.9) Consider the function f(t,x) = 


2x n 


Then, f x (t , x) = 2x n — l 1 , f xx (t , x) — 2nx" 1 and f t (t, x) — —2tx. Apply ltd 


n +1 


t x. 


2 B\ 


n+1 


formula to f(t, B t ) = :l ^ 1 
2 B' t ' +] 


t 2 B, 


n+1 


- t 2 B, 


V= / A(+ B,.)dB. v + i / /„(,*, B,)d* + / / f (s, 
Jo ^ Jo Jo 

f {2B n s - s 2 )dB s + f nB"~ l ds — f 2sB s ds 
Jo Jo ‘ Jo 

[ (2B” - j 2 )dB, + [ (nB" _1 - 
Jo Jo 




Therefore, 


f , 2B,' ,+1 , C' , 

/ (2B; - s 2 )dB, = —- t 2 B, - / (n/ir 1 - 2s B, 

J 0 « + 1 Jo 


2.s fi v )d,s. 


)ds. 


Solution 3.10 (to Exercise 3.10) RF apply (3.47) w+/z /(f, x) = x 2 and 
Z, = B f . Hence f t {t, x) — 0, f x (t, x) — 2x and f xx (t, x) = 2. 

1 9 

dF, = 2B,dB, + -2(dB,) 2 = 2B,dB, + dt. 

Solution 3.11 (to Exercise 3.11) We apply (3.47) with f{t, x) — 2 + t + e x and 
X t = B,. Hence f t (t, x) = 1, f x {t, x) = anB f xx (t, x) = e x . 

d Y, = dt + e B, dB, + ^e B 'dt = ^ 1 + ^ B '^j dt + e B 'dB t . 

Solution 3.12 (to Exercise 3.12) Denote by Z, = fl e Xs dB s and define 
f(t,x)=xe~ Xt , therefore U t — f{t,Z t ). We now apply ltd formula to this 
f(t,x). 


f x (t, x) = e Xt , f xx {t,x) = 0, f t (t,x) = -Xxe Xt = -Xf(t,x). 
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We now apply (3.47) to d fit, Z t ) — d U t : 

d U, = -XZ t e~ Xt dt + e~ lt d Z t . 
Now notice that Z t e~ Xt = U t and dZ, = e lt B t . Thus 


d U, — —XU,dt + d B t . 

The process U t is called the Ornstein-Uhlenbeck process. 

Solution 3.13 (to Exercise 3.13) Note that f x (x ) = cos(.r), f xx (x ) = — sin(x). 
An application of the ltd formula gives 


But 


Y, = Y„ + f f x (X s )dX s + - I f xx (X s )(dX s )\ 

Jo 2 J 0 

[ f x (Xf)dX s = ( cos(X s )X s ds + a f cos(Z,)d B s 
Jo Jo 


i r 


and 


1 [ fxx(X s )dX s 

2 Jo 


-Jjf- 

-u 


sin(A s )(dA s )- 


sin(Z s )(Z;(d.0 2 + cr 2 (dB s ) 2 + 2aX s dtdB s ) 


sin(Z i .)di. 


Therefore 


f 


Y, = Y 0 + I ( cos(Z s )Zj - — sin(X s ) ) ds + a / cos(X,)d5, 


/' 


or 


d Y, — ( cos(X,)X t —— sin(X f ) ) dt + o cos(X,)d5 f . 


Solution 3.14 (to Exercise 3.14) We try to apply ltd formula to derive the result. 
Consider the function f(t,x) — log(x). Then, f x (t,x) — f xx {t,x) — —V an d 
f,(t, x) — 0. Therefore, 


f 


log(X f ) = log(Xo) + / f x {s,X s )dX s + X -j f xx (s,X s )(dX s ) 2 


f dX s 1 r f (dX s ) 2 

— !og ( Xo) + 


- log(Xo) + 


£i 


(b(s)X s ds + a(s)X s dB s ) 
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“if i( fe2 W 2 ( d s) 2 + or 2 (s)X s 2 (dS,) 2 + MsM^dsdB,) 

2 Jo 

= log (Xo) + ^b(s) ~^cr 2 (s)jds + J cr(s)dB s . 

Remember that exp{a + b} — exp{c«} exp{J>), then 


X, = Xq exp | J ^b(s) - ^o- 2 (i)^ ds + J cr(s)dB s . 

To end the proof, we need to prove that X t is a solution of the proposed stochastic 
differential equation. This is trivially true and left to the reader. 

Solution 3.15 (to Exercise 3.15) Let /(r, x) = j-p -, then f x (t,x) = j^, 
fxx(t,x) — 0 and ft(t,x ) = - (1 + 2 . In order to show that X t is a solution 
of the stochastic differential equation considered, we apply ltd formula to 
fit, B t ) = X t = Indeed, 


X t = fit, B t ) — f f x (s. B s )dB s + \f f xx is,B s )ds+ f f t (s,B s )ds 
Jo £ Jo Jo 

i' i * * 

Jo 1 + s Jo (1 + s) 2 

= r 3*--f Jh-to 

J 0 1 + 5 Jo (1 + s ) 


and hence, in differential form, we have 


1 1 

dX, = - X,dt + 


1 + t 


1 +1 


d B t . 


Solution 3.16 (to Exercise 3.16) Let f(t,x) — (a l ^ 3 + ^x) 3 and notice that 
f x (t, x) = (a 1 / 3 + \x) 2 , f xx (t, x) = | (a 1/3 + ±x) with f,(t, x ) = 0. As in 
Exercise 3.15, we apply ltd formula to fit, B t ) — X, — (a 1 / 3 + yB t ) 3 . Hence, 


X t = fit, B, 


fxxis, B s )ds + 


'») = f fxis, B s )dB s + 1 / 

Jo 2 Jo 

= a + f (a 1/3 + ^B s ) 2 dB s + \ [ (a 1/3 + ^B s )ds 
Jo 3 3 Jo 5 

= X 0 + f X 2 J 3 dB s + \ f Xy 3 ds 
Jo J Jo 


f Ms, 

Jo 


B s )ds 


or dX t = iX, 1/3 d t + X r 2/3 dfi,. 
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Solution 3.17 (to Exercise 3.17) Consider f(t,x) — -^—. Then, f x U,x) = 
— ([ + v)2 . f xx (t, x) = 3 and f t (t,x) — 0. We apply ltd formula to Y t — 

f(t, B t ) — y-pjp and obtain 

Y t — f (t, B t ) — [ f x (s, B s )dB s + 

Jo 


fxx Cc B s )ds + 


f Ms. 
Jo 


B s )ds 


if 

2 Jo 

r l l r l l 

Jo (1 + 5J 2 Jo (1 + fl, 

= 1 — /* + f Y?ds 

Jo Jo 

am/ jo df, = F 3 dr — F 2 d5,. 

Solution 3.18 (to Exercise 3.18) Let Y, = J ( j j^dB s . Then, we should uso ltd 
formula in differential form for f(t, y) = t + (1 — t)y so that X t = f(t, Y t ). 
Hence, f t (t, y) = 1 — y, f y (t, y) = 1 — t and f yy (t, y) = 0. Notice that dy f = 
There, by ltd formula we get 

dX, = d f(t, Y t ) = (1 - Y,)dt + (1 - t)dY, = (1 - Y,)dt + d B, 


d B s ) df + d B, — 


1 


1 - f — 

Jo 1 - 

-(t + (l-0/oT^dfi s ) 


(1 - t ) - (1 - t )/ 0 
1 -t 


■dt -\- d Bf 


1 - X, 

dt + d B, = -df + d B t . 


1 — t 1 — t 

Solution 3.19 (to Exercise 3.19) We need the first two derivatives of /(■) 

1 „ cr x (u) 


fx(u) = —— f xx (u) = 

a(u) 


cr 2 (u) 


and we can apply ltd formula 

d f(X t ) = f x (X t )dX t + \f xx {X,fidX,) 2 

H(X,)d t + o{X t )dB t 1 0 x {X t )QL{X t )to + o(X,)dB t ) 2 
2 o 2 (X t ) 


o{X t ) 
dt + d B, 


B(X t ) 
o(X,) 

1 cr x (X t )([z 2 (X t )(dt) 2 + o 2 (X f )(d B t ) 2 + 2/x(X t )o(X t )(dtdB t ) 

2 o 2 (Xf) 

u(X t ) 1 

tl-H-dt + dB, - -a x (X t )dt 
o(X t ) 2 

















therefore 
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fu(X t ) 1 \ 

dF, - d f(X t ) = - -cr x (X t ) dl + d B t . 

\a(X t ) 2 ) 

Notice that the Lamperti transform has the effect of making the diffusion coeffi¬ 
cient unitary. 


3.22 Bibliographical notes 

Classical references on stochastic processes, Brownian motion, stochastic dif¬ 
ferential equations and Ito formula are Karlin and Taylor (1981), Karatzas and 
Shrevre (1988) and Rogers and Williams (1987). For additional examples and 
exercises on ltd calculus, one can refer to 0ksendal (1998). The books of Sato 
(1999), Bertoin (1998) and Applebaum (2004) contain rigorous treatments of 
the theory of Levy processes and stochastic calculus with jump processes. For 
abstract stochastic analysis and calculus for several classes of processes we refer 
to Protter (2004). We also suggest the textbook of Di Nunno et al. (2009) which 
is a gentle introduction to Malliavin calculus in view of applications in finance. 
Finally, the books of Cont and Tankov (2004) and Schoutens (2003) review sev¬ 
eral types of Levy processes along with their characteristic triplets, simulation 
schemes, etc. 
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4 


Numerical methods 


4.1 Monte Carlo method 

Suppose we are given a random variable X and are interested in the evaluation 
of E(g(X)) where g(-) is some known function. If we are able to draw n pseudo 
random numbers x\,... ,x„ from the distribution of X, then we can think about 
approximating E(g(X)) with the sample mean of the g (x,-), 



(4.1) 


i =1 


The expression (4.1) is not just symbolic but holds true in the sense of the law 
of large numbers whenever E|g(X)| < oo. Moreover, the central limit theorem 
guarantees that 



where N(m, s 2 ) denotes the distribution of the Gaussian random variable with 
expected value m and variance ,v 2 . In the end, the number we estimate with simu¬ 
lations will have a deviation from the true expected value Eg(X) of order 1 /sJTi. 
Given that P(\Z\ < 1.96) ~ 0.95, Z ~ IV(0, 1), one can construct an interval for 
the estimate g n of the form: 



with o' = v / Var(g('A r )), which is interpreted such that the Monte Carlo estimate of 
E(g(X)) above is included in the interval above 95% of the time. The confidence 
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interval depends on Var(g(X)), and usually this quantity has to be estimated 
through the sample as well. Indeed, one can estimate it as the sample variance 
of Monte Carlo replications as 


a 


2 



n 

!>(*,)-#„) 2 

;=i 


and use the following 95% level Monte Carlo confidence interval 1 for E(g(X)): 

(g„ - 1.96-^=, g n + 1.96-^=) . 

\ V n v' n ) 

The quantity a/*Jn is called the standard error. The standard error is itself a 
random quantity and thus subject to variability; hence one should interpret this 
value as a measure of accuracy. 

One more remark is that the rate of convergence n is not particularly fast 
but at least is independent of the smoothness of g( ). Moreover, if we need to 
increase the quality of our approximation, we just need to draw additional new 
samples instead of rerunning the whole simulation. 


4.1.1 An application 

Suppose we have a function g defined as [a, b] taking values in [c, d\, c, d > 0. 
Assume that we want to calculate the integral 

b 

g(x)dx 

Expected values are of course integrals, so we can apply the Monte Carlo method. 
The integral g(x)dr is the area under the curve, which we can then calculate 
as the proportion of points in the rectangle [a , b\ x [c. d\ under the curve g; 
of course we need to rescale by the total area of [a, b] x [c, d\, i.e. by A = 
(b — a) * (d — c). To transform this the Monte Carlo way, we need to use proper 
random variables. Which is the random variable involved? We can consider a 
2-dimensional uniform random variable (X , Y ) on the rectangle [a . b ] x [c, d\, 
consider the indicator function of the event ‘g(X) < Y’ and take the expected 
value of this indicator function, i.e. 

E(l { r< gm) ) = P(Y < g(X)) 

so 1/A x E(ljg(x)<rj) is just the integral of g. The algorithm is as follows: 

(1) i = 1, set So = 0, A = (b — a) * (d — c) 

(2) extract a uniform random number x ~ U(\a. h\) 

1 Again, this means that the interval covers the true value 95% of the time. 
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(3) extract a uniform random number y ~ U ([c, d\) 

(4) if y < g(x) then .S’, = S,-_i + 1 

(5) i = i + 1 

(6) if i — n exit, otherwise go to 2) 

(7) fa g(x)&c - A* S„/n 

Next code is an implementation of the above. Just for checking the approximation, 
we use a polynomial function g{x) = x 1 and consider the interval [a, b ] = [0, 2], 
therefore [c, d] — [0, 4]. In this case, 



and the code is as follows: 

R> set.seed(123) 

R> g <- function(x) x*2 
R> a <- 0 

R> b <- 2 

R> c <- 0 

R> d <- 4 

R> A <- (b - a) * (d - c) 

R> n <- le+05 

R> x <- runif(n, a, b) 

R> y <- runif (n, c, d) 

R> A * sum(y < g(x))/n 

[1] 2.66792 

R> integrate(g, a, b) 

2.666667 with absolute error < 3.0e-14 

Notice that instead of writing a loop we used vectorization capabilities of R. 
Instead of extracting a single number we generate two vectors of random numbers 
x and y of length n. Thus, y and g(x) are two vectors and y < g(x) returns a 
vector of comparisons of each element of y with the corresponding element of 
g (x) , hence a vector of true/false. These are the indicator functions. The 
function sum applied to this vector first converts true /false into l/o and then 
sums this vector returning the number of ones. Divided by n is the Monte Carlo 
estimate of E(h i Y <g( x j \). Figure 4.1 shows how the random points fill the area 
under the curve g(-) as the number n of Monte Carlo replication increases. 

Exercise 4.1 Find a way to calculate it using the Monte Carlo method and write 
the corresponding R code. [Flint: use the formula of the area of the circle.] 
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n = 1000, val = 2.7040 n = 10000, val = 2.6704 n = 100000, val = 2.6679 
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Figure 4.1 Approximation of the area as a function of the number of Monte 
Carlo replications n = 1000, 10000, 10000. 


4.2 Numerical differentiation 


Consider a function /(■) such that first order Taylor expansion up to second order 
is admissible around some point xq, i.e. in the interval [xq — e, xq + ej. Then, the 
derivative of fix) of / at point x e [xo — e, xq + e] can be written as follows: 

fix) = /(x 0 ) + /'(x 0 )(x - x 0 ) + ~ f n (x 0 ) (x - xo ) 2 + 0((x - x 0 ) 2 ). 

For simplicity, let us denote by h— x — xq, \h\ < e, then 


,,, , /(x 0 + h) - /(x 0 ) 1 „ 

/ (*o) =- - --/ (x 0 )h + 0(h). 

h 2 

By definition of first derivative we know that 

fix Q + h) - f{x 0 ) 


/'(xo) = lim 
h—> 0 


h 


Thus, one can think of approximating the first derivate of a function 
incremental ratio 


fix o) ~ 


fjxp + h) - fjx q) 
h 


for very small positive h and the residual error 


with the 


fix o) 


fjxp + h) - fjx q) 
h 


-\f"ixtf)h 


is proportional to h. For example, let x > 0 and fix) = x x and suppose that we 
want to calculate the first derivative of / at point x = 1, i.e. /'(1) = 1. While 
the numerical derivative if h — 0.01 is 
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R> h <- 0.01 
R> xO <- 1 

R> err <- 1 - ((xO + h)*(xO + h) - xO^xOf/h 
R> err 

[1] -0.01005033 

which is of order h. If we take h =0.001 we get 

R> h <- 0. 001 
R> x0 <- 1 

R> err <- 1 - ((xO + h)*(xO + h) - xO^xOf/h 
R> err 

[1] -0.001000500 

so it is clearly proportional to h. Unfortunately, in the above approximations 
there are two kinds of errors. The first is the truncation error, which comes from 
the higher order terms in the Taylor expansion. The second is the roundoff error 
which is strictly related to the internal binary representation of numbers used by 
computers. Every number a in a computer is internally represented as a( 1 + e) 
where, in single floating point precision, is of order ~ 10“ 7 . Suppose we choose 
xq — 10.3 and h = 0.0001 = 10 5 . Then, inside the computer both xo — 10.3 
and xq + h = 10.3001 admit a representation which is exact only up to the e 
error, i.e. each number x is represented internally as x(\ + e), thus x(l + s)/h 
in our example is of order 10 2 . It is clear that this roundoff error propagates 
into the calculation of the numerical derivative as well. It is possible to reduce 
the effect of the roundoff error by appropriate optimal choices of h Press el al. 
(2007) and at the same time one can use it to refine the approximation of the 
derivative in order to reduce the truncation error. Here we explain how to reduce 
the truncation error. For example, if we consider the third order Taylor expansion 
of both fix o + h) and f(x o — h), i.e. 

fix o + h) = fix o) + fix 0 )h + \f"ix 0 )h 2 + i/'"(xo)fi 3 + Oih 3 ) 

fix 0 - h) = fix o) - fix 0 )h + i f"ix 0 )h 2 - ^/'"(xo )/! 3 + Oih 2 ) 


we get that 

fix 0 + h) - f (x 0 - h) = 2 f\x 0 )h + ^/'"(xo)/; 3 + Oih 3 ) 
and hence, we can introduce the symmetrized numerical derivative 
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and the reminder is of order h 2 . We can see it empirically as well. 


R> h <- 0.01 
R> x0 <- 1 

R> err <- 1 - ((xO + h) A (xO + h) - (xO - h) A (xO - h)) / (2 * h) 

R> err 

[1] -5.000083e-05 

R> h^2 

[1] le-04 

R> h <- 0. 001 
R> x0 <- 1 

R> err <- 1 - ( (xO + h) A (xO + h) - (xO - h) A (xO - h)) / (2 * h) 

R> err 

[1] -5e-07 

R> h y '2 

[1] le-06 

Still, the roundoff error is present, but the overall precision increases to h 2 with¬ 
out the need for explicit calculation of higher order derivatives, i.e. without the 
need to use explicit second order Taylor expansion. Another approach is the 
Richardson’s extrapolation method (Richardson 1911, 1927), which is an accel¬ 
eration method such that, for an approximation method of order h p transforms it 
into one of order h p+l . The symmetrized numerical derivative is not unrelated to 
this. The idea is the following: suppose we have an approximation method such 
that a certain quantity Q can be approximated as follows: 


Q = Q(h) + ah p + O (h p+l ) 


where the term a is independent of h (Taylor expansion is one such method). 
The idea is to eliminate a taking two different values of h, say h \ and /)? so that 
the term of order h p . Thus, for example 


Q= Qih^ + ah? + o(/tf +1 ) 
Q = Q(h 2 ) + ah 1 ’ + O ( h p 2 +l ) 


and, given that ah^hi = O (/! ,,+ l ) and ah^h\ = O (/! /,+ l ), we can write 


{h{ - h[)Q = hlQQn) - h[Q{h 2 ) + O ( h p+l ) 
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Thus, the Richardson’s extrapolation is 

hlQ(hi)-h{Q(h 2 ) 


Qb 


h{-h\ 


The usual choice is to take h \ — h and hi — \h so that 

Q(h)-hPQ (|) 2 >>Q(%)-Q(h) 


Qr = 


(l) J ~hP 


2 p - 1 


If we apply this to the previous example with p — 1 we obtain 


R> h <- 0.001 
R> x0 <- 1 

R> err <- 1 - (2 * (x0 + h/2)''(x0 + h/2) - (x0 + h) A (xO + h)) 
R> err 


[1] 5.003753e-07 

The symmetrized derivative is clearly an application of this method. The same 
idea can be iterated to increase the calculation of each term on the right-hand 
side of the naive Richardson’s extrapolation to further increase the quality of 
the approximation. For our purposes the symmetrized derivative is enough, but 
in general one can obtain numerical derivatives in R using different packages. 
One option is the numDeriv package. The package has two main functions to 
calculate the numerical gradients grad and hessians hessian of multidimensional 
functions using the Richardson’s extrapolation method. The use is quite simple 
and we present here an example, such as: 

R> require(numDeriv) 

R> f <- function(x) x^x 
R> grad(f, x = 1) 

[ 1 ] 1 

R> gradff, x - 1, method = "simple") 

[ 1 ] 1.0001 

Notice that the package numDeriv implements a refined version of the 
Richardson’s extrapolation method. When we specify ‘simple’ in the argument 
method we get the naive result. 


4.3 Root finding 

Calibration of financial models or the methods of moments in statistics more 
or less correspond to the problem of finding the roots of a function. For 
one-dimensional function there are many specific algorithms while for the 
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multidimensional case the problem of root finding is sometimes translated into 
an optimization (minimization/maximization) problem. We consider optimization 
in the next section; here we focus on the problem of root finding. If the function 
is a polynomial function, like 


p(x) = ciq + ct\ ■ x + ... + a n ■ x n 

then p(x) — 0 always has n solutions (at least in the complex plane). For 
example, the second order polynomial p(x) — a ■ x 2 + b ■ x + c has two well- 
known solutions: 

—b±\Jb 2 — Aac . , „ 


which can be either real or complex, depending on the sign of A = b 2 — Aac. 
In R these solutions can be found using the polyroot function, specifying only 
the vector of coefficients gq, a\, ..., a n . For example, suppose we want to find 
the solutions of 

1 + 2x + x 2 = 0 

which we already know are equal to —1 with multiplicity 2. Then, we use 

R> polyroot(c(1, 2, 1)) 

[1] -1-Oi -1+Oi 

For general functions, like the calculations of implied volatilities (see Section 
6.6) we can use the uniroot function. This method requires the specification of 
the range of values in which to find the solutions and, optionally, the precision 
required and maximal number of iterations of the numerical method used. The 
method requires that the function for which we want to find the roots has opposite 
signs in the extremes of the specified interval. This allow the algorithm to improve 
the speed of convergence of the algorithm. For example, in the above case if we 
write something like 

R> f <- function (x) 1 + 2 * x + x*2 
R> uniroot(f, c(-2, 2)) 

R will raise an error, because that function f is always non-negative. But some¬ 
thing like this: 

R> f <- function(x) 2 * x + x^2 - 2 
R> uniroot(f, c(-2, 2)) 

$root 

[1] 0.732051 
$f. root 

[1] 4.266697e-07 




$iter 
[1] 7 
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$estim.prec 
[1] 6.103516e-05 

produces the correct result. 


4.4 Numerical optimization 


If the problem is the minimization or maximization of a function, as is required 
in maximum likelihood estimation, then the problem is relatively easy in the 
one-dimensional case, but quite defeating in the high-dimensional case. Even in 
the one-dimensional case, depending on the algorithm that is used and due to the 
iterative nature of most methods, it is not always clear if the maximum/minimum 
obtained is a global or a local stationarity point. The function nim applies to the 
Newton-Raphson method, which is an iterative method based on the fact that, if 
x n+ \ is the point such that f(x n+ \) = 0, then 


and hence 


/'(-*«) 


f(x n ) - 0 

%n -* 72+1 


•* 72+1 - -*72 


f{Xn ) 
f(x n y 


starting from an initial value and assuming that the gradient of the function / is 
known, is a relatively efficient method to look for zeros of a function. If the same 
idea is applied to the first derivative of / (and the second derivative of / is also 
known) then the Newton-Raphson method is fairly good at finding the points in 
which the first derivative is zero, i.e. the points of maximum and minimum of a 
/. But the function / should be fairly regular as well. For example, 


R> f <- function(x) 2 * x + x*2 - 2 
R> nlm(f, 0) 


$minimum 
[1] -3 

$estimate 
[ 1 ] -1 

$gradient 

[1] 1.000089e-06 


$code 
[ 1 ] 1 


$iterations 
[ 1 ] 1 
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Figure 4.2 The ‘wild’ function is quite difficult to minimize. 


finds the correct minimum. In the nlm command, the second argument is the 
initial value of the Newton-Raphson sequence. Consider now the so-called wild 
function in Figure 4.2: 

R> f <- function(x) 10 * sin(0.3 * x) * sin(1.3 * x*2) + le-05 * 

+ x''4 + 0.2 * x+80 

Then, nlm finds only a local minimum 

R> nlm(f, 0)$estimate 
[1] -6.314295 
R> nlm(f, 20)$estimate 
[1] 19.93778 

So, this is fast but unsatisfactory. An alternative solution in R is the function 
optim which is an interface to different optimization methods (for more infor¬ 
mation on each method, the reader should check the documentation page of the 
function optim). For example, the following code is able to find the real minimum 
of the wild function using the simulated annealing method 

R> res <- optim(50, f, method = "SANN", control = 
list(maxit = 20000, 

+ temp - 20, parscale = 20)) 

R> res 

$par 

[1] -15.81506 

$value 
[1] 67.4678 
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$counts 

function gradient 
20000 NA 

$convergence 
[1] 0 

$message 

NULL 

R> res <- optim (0, f, method = "SANN", control = 
listfmaxit = 20000, 

+ temp = 20, parscale = 20)) 

R> res$par 

[1] -15.66174 

which, as we see, produces a quite stable result compared to nim. Both nim 
and optim accept functions / with vector arguments, so they also work in the 
multidimensional case. The optim function also accepts constraints, which is 
very useful in quasi-maximum likelihood estimation for diffusion processes (for 
example, in the estimation of the parameters of the volatility in interest rates 
models). The mie function indeed uses the optim function internally. 


4.5 Simulation of stochastic processes 

4.5.1 Poisson processes 

The simulation of a homogenous Poisson process is quite easy, in that interval 
between two Poissonian event are exponentially distributed. Hence the simulation 
of exponential random variables with rate A is enough. The only problem is that, 
for a given time length, say [0, T] it is not known a priori the number of events 
that will occur but only the average number which is AT. Hence, a big number 
of exponential events should be considered in order to have a full trajectory or, 
otherwise use an iterative scheme. We present both schemes: 

R> set.seed(123) 

R> lambda <- 0.8 
R> T <- 10 

R> avg <- lambda * T 
R> avg 

[ 1 ] 8 


R> t <- 0 
R> N <- 0 
R> k <- 0 

R> continue <- TRUE 
R> while (continue){ 
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+ event <- rexp(l, lambda) 


+ 

if (sum (t) + event < T) { 


+ 

k <- k + 1 


+ 

N <- c(N, k) 


+ 

t <- c(t, event) 


+ 

} 


+ 

else { 


+ 

continue <- FALSE 


+ 

t <- cumsum(t) 


+ 

N <- c(N, k) 


+ 

t <- c(t, T) 


+ 

+ } 
R> N 

} 


[1] 

0123456789 

10 11 12 12 

R> t 

[1] 

0.000000 1.054322 1.775084 

3.941765 

3.436403 3.475875 

[8] 

4.334549 4.516133 7.923928 

10.000000 

7.960370 9.216408 

and the noninterative version 


R> set.seed(123) 


R> t 

<- cumsum(c(0, rexp(10 * avg, 

lambda))) 

R> last <- which(t > T)[1] 


R> t 

<- t[l:last] 


R> t[last] <- T 


R> N 

R> N 

<- c(0, 1:(length(t) - 2), length(t) - 2) 

[1] 

0123456789 

10 11 12 12 

R> t 

[1] 

0.000000 1.054322 1.775084 

3.941765 

3.436403 3.475875 

[8] 

4.334549 4.516133 7.923928 

10.000000 

7.960370 9.216408 


R> plot(t, N, type = "s", main = "Poisson process", 
ylab = expression(N(t)), 

+ xlim = c (0, T)) 


3.546138 
9.816676 


3.546138 
9.816676 


Figure 4.3 contains a plot of the simulated path. The simulation of a nonhomoge- 
neous Poisson is a bit more complicated and is based on the acceptance/rejection 
or thinning method introduced by Lewis and Shedler (1979). We will refer to it 
as Lewis method. Optimized versions of it can be found in Ross (2006) or Ogata 
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Poisson process 



Figure 4.3 Simulation of homogenoeus Poisson process. 




Figure 4.4 Algorithm to simulate inhomogeneous Poisson process. 


(1981). The method is as follows: let X(t) be the intensity function and assume 
there exists a constant X such that 

X(t) < X, 0 < t < T . 

A homogeneous Poisson process with constant rate X. When the event occurs 
at time t this event is considered as an event for the nonhomogeneous Poisson 
process with probability X(t)/X. The selected events are indeed Poissonian events 
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with rate 7.(7)- The algorithm is represented in Figure 4.4. An example of code 
which performs the simulation is as follows: 

R> set.seed(123) 

R> lambda <- 1.1 
R> T <- 20 
R> E <- 0 
R> t <- 0 
R> while (t < T) { 

+ t <- t - 1/lambda * log(runif(1)) 

+ if (runif(1) < sin(t)/lambda) 

+ E <- c(E, t) 

+ } 

R> plot (E, 0:(length(E) - 1), type = "s", ylim = 
c(-4, length(E)), 

+ ylab = expression(N(t)), xlab - "t") 

R> curve(-3 + sin(x), 0, 20, add = TRUE, lty = 2, lwd = 2) 

The trajectory is represented in Figure 4.5. 

4.5.2 Telegraph process 

We remind that the telegraph process is dehned as 

A, = xo + [ V s ds, V, = y 0 (-iy\ t > 0, 

Jo 

where N, is a Poisson process and Vo is a discrete random variable independent 
of N, and taking values +c and —c with equal probability. Thus, simulation is 
very easy when a path of N, is available we only need to simulate an initial value 
of the velocity Vo and then discretize the above integral. More precisely, let r,■ be 
the length of the random interval between two subsequent Poisson events and for 
simplicity we set t Nt — T — t Nt _ \. Then we can rewrite X, as a random sum 

Nj 

X t =* 0 + V 0 ^r,(-iy 

i= 1 



Figure 4.5 Simulation of nonhomogeneous Poisson process with rate 
X(t) = sin(f). 
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Figure 4.6 Simulation of the Telegraph process (up) based on the underline 
Poisson process (bottom). 



R> T <- 20 

R> lambda <- 5 

R> avg <- lambda * T 

R> t <- cumsum(c(0, rexp(10 * avg, lambda))) 

R> last <- which (t > T) [1] 

R> t <- t[l:last] 

R> t[last] <- T 

R> N <- c(0, 1: (length (t) - 2), length (t) - 2) 

R> c <- 2 

R> VO <- sample(c(-c, +c), 1) 

R> ds <- diff(t) 

R> nds <- length(ds) 

R> xO <- 0 

R> X <- c(x0, xO + cumsumfVO * ds * (-1) * (1:nds)) ) 

We can now plot both trajectories together on the same graph 

R> par(mfrow = c(2, 1)) 

R> par (mar = c (3, 4, 0.5, 0.1)) 

R> plotft, X, type = "1") 

R> plotft, N, type = "s") 

and the result is shown in Figure 4.6. For pure fun, we can see empirically what 
was explained in Section 3.12.3 on the convergence of the telegraph process to 
the Brownian motion. For this we need to set X — c 2 and let c —>■ oo, which in 
our case means a large value of c. 

R> T <- 1 
R> c <- 100 
R> lambda <- c*2 
R> avg <- lambda * T 

R> t <- cumsum(c(0, rexpflO * avg, lambda))) 
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R> last <- which (t > T) [1] 

R> t <- t[l:last] 

R> t[last] <- T 

R> N <- c(0, 1: (length (t) - 2), length (t) - 2) 

R> VO <- sample(c(-c, +c), 1) 

R> ds <- diff(t) 

R> nds <- length(ds) 

R> xO <- 0 

R> X <- c(xO, xO + cumsum(V0 * ds * (-1) A (1:nds))) 

Figure 4.7 shows a limiting trajectory for the telegraph process which looks 
qualitatively close to the path of a genuine Brownian motion. 

4.5.3 One-dimensional diffusion processes 

Let X — {X,, 1 > 0} a diffusion process solution to the stochastic differential 
equation 


dX, =b(t,X,)dt+ cr(t,X t )dB t (4.2) 

with some initial condition Xq. We assume that the drift and diffusion coeffi¬ 
cients satisfy the usual regularity assumptions as in Section 3.15.1 to ensure the 
existence of a solution of (4.2). The most used scheme to simulate stochastic dif¬ 
ferential equations is the Euler-Maruyama scheme. This method is far from being 
optimal but it is one of the few which is available also for the multidimensional 
case and for Levy processes. We assume to discretize the interval [0, T] into a 
grid of points 0 = to < t\ <■■■ < t N = T. This grid does not need to be regular. 
To simplify the exposition we use the notation X(tj) = X,. The approximated 



Figure 4.7 Limiting path of the telegraph process when X = c 2 and c —»• oo. 
This trajectory looks qualitately close to a Brownian motion path. 
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Euler-Maruyama solution is a new continuous stochastic process F satisfying the 
iterative scheme 

Y i+ i = Yi + b(f u Yi)(t i+ 1 - U) + cr(ti, Yi)(B i+ 1 - Bi), i = 0, 1,..., N - 1, 

(4-3) 

with Fo = Xq. Outside the points of the grid, the process is defined via linear 
interpolation. So, the implementation of this method only requires the simula¬ 
tion of the increments of the Brownian motion B, + i — B, which are Gaussian 
distributed with zero mean and variance equal to the time interval t- i+ \ — tj, so 
B i+ 1 — Bj ~ y/tj + 1 — tj ■ N(0. 1). The algorithm is represented in Figure 4.9. The 
stability of this method depends on the regularity of the diffusion coefficient and 
on the distance between time points on the grid. Several modifications of this 
method exists in the one-dimensional case and mostly used is the Milstein (1978) 
scheme. The idea behind the method is to apply Ito’s Lemma to obtain a second- 
order expansion and increase accuracy. The resulting approximating process Y 
satisfies the new iterative scheme 

Y ,+1 - Yi + b( ti , F,)(/,+, - /,) + a(f it F,)(B,+ i - B,) 

+ Yj)o x {ti, Y t ) {(fi,+i - Bi) 2 - (t l+ \ - tj )} 

where cr x (t,x) is the partial derivative of a(t, x) with respect to variable x. 
When the conditional distribution of X t \X t -A, is available, it is possible to exploit 
the Markov property of the diffusion process and obtain an exact simulation of 
the path. Unfortunately, this is a rare event, so other approximation schemes 
exist. For an updated review, see Iacus (2008). The package sde implements 
most of the available methods in the literature via the sde.sim function. The 
package only simulates one-dimensional paths or m independent paths but not 
a real multidimensional process. The use is quite simple. Assume we want to 
simulate the stochastic process X solution to 


dX, = (1 - 2X t )dt + 7(1 + X 2 )dB„ X 0 = 2, 0 < T < 1. 


We first need to define R expressions to describe the drift and diffusion coeffi¬ 
cients as functions and then pass them to the sde.sim function 


R> require(sde) 

R> set.seed(123) 

R> b <- expression(l - 2 * x) 

R> s <- expression(sqrt(1 + x*2)) 

R> X <- sde.sim(X0 =2, T = 1, drift = b, sigma = s) 

The sde.sim function outputs an object of type ts which can be handled 
in R as such, e.g. it can plotted via piot(x). The interface of sde.sim is 
quite flexible and by default, it implements the Euler-Maruyama scheme with 
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predictor-corrector (see e.g. (Kloden el al. 2000)). If, for example, we want to 
use Milstein scheme, we can specify it in the following way: 

R> X <- sde.sim(X0 = 2, T = 1, drift - b, sigma = s, 
method = "milstein") 

For the few models for which the conditional distribution is known, one can 
specify the argument model and the additional vector of parameters theta which 
characterize the model. For example, the geometric Brownian motion or Black & 
Scholes model, which we analyze in detail in Chapter 5, satisfies the following 
stochastic differential equation: 


dX, — OiXfdt + 9 2 X t dB t , X 0 = x 0 . 

Assume we want to simulate it for 9 1 — 1, 0 2 — 0.5 and Xq — 2. Then we use 
the following code: 

R> X <- sde.sim(X0 =2, T = 1, model = "BS", theta = c(l, 0.5)) 

For more accurate simulation, one can increase the number of points in the grid 
by choosing the appropriate n, by default n = 100, or specifying the S increment 
using argument delta. The next code makes use of the argument n and the plot 
can be seen in Figure 4.8. 

R> set.seed(123) 

R> X <- sde.sim(X0 = 2, T = 1, model = "BS", theta = c(l, 0.5), 

+ N = 5000) 

R> plot (X) 


The sde.sim function implements Euler-Maruyama, both Milstein schemes, the 
KPSS method (Kloden et al. 1996), the Local Linearization Method for homo¬ 
geneous (Ozaki 1985, 1992) and inhomogenous stochastic differential equations 
(Shoji 1995, 1998) and the Exact Algorithm (Beskos and Roberts 2005). For a 
detailed description of all arguments and options of the sde. sim function, please 
refer to the man page of the function in the package pkg. 



Figure 4.8 Simulation of the Black and Scholes model using sde. sim. 
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set i = 1, Y[i] = y 0 , initialize time vector t 



Figure 4.9 Algorithm to simulate stochastic differential equations. 


4.5.4 Multidimensional diffusion processes 

The package yuima 2 allows us to simulate multidimensional diffusion processes 
via Euler-Maruyama scheme. The choice between the sde and yuima packages 
depends on the specific application. The latter assumes the description of more 
structure and it is not limited to diffusion processes. With the yuima package 
the user first needs to construct the model before simulation. Let X be an m- 
dimensional diffusion process solution to 

dX t =b{f,X t )dt + a{t,X t )dB t , t e [0, T], (4.4) 

where B, is an r-dimensional standard Brownian motion, i.e. a vector process 
B, — ( B ), Bj ,..., B' ) J where all components are independent standard Brow¬ 
nian motions. In the above b(x, 6) is an m -valued function and a(x, 0) is an 
m x r matrix valued function. Let {r, , i = 0, 1,..., n) be a grid of times such 
that to — 0 and t n = T and let us denote X t . by X it i = 0, 1,..., n. The Euler- 
Maruyama scheme approximates the solution of X, with the discretization of the 
stochastic differential equation (4.4) in the following manner 

Xi = Xi- 1 + b(ti-u *,--!) A,- + crfe-i, Xi^y/ABi (4.5) 

with A,■ = tj — ti -1 and A B, = B tj — 5* Then, conditionally on the value of 
X{- 1 the random variable Xi can be obtained by simulating the increments of the 
Brownian motion A B, which is a multivariate Gaussian random variable with 
independent components. Theoretical properties of the Euler-Maruyama scheme 
can be found in Kloden and Platen (1999). Other approximation schemes for the 
one-dimensional case can be found in Iacus (2008). We now see some examples 

2 The package is available at http://R-Forge.R-Project.org/projects/yuima 
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on how to simulate multidimensional stochastic differential equations with the 
yuima package. Let us consider the following example of two-dimensional 
diffusion process X = (Xj, Xj) driven by three independent Brownian motions 
B t — < B]. B?, Bf) and solution of the following system of stochastic differential 
equations 


dX] = -3Xjd t + d B] + XjdB? 

dXj = -(X, 1 + 2Xf)d t + XjdBf + 3d B?. 


In order to be described in a form which is suited for the yuima package, we 
rewrite it in matrix form as follows: 



Now we prepare the model using the setModei constructor function: 

R> require(yuima) 

R> sol <- c("xl", "x2") 

R> b <- c("-3*xl", "-xl-2*x2") 

R> s <- matrix (c ("1", "xl", "0", "3", "x2", "0"), 2, 3) 

R> model <- setModei(drift = b, diffusion = s, 
solve.variable = sol) 

The vector sol defines the variables which are used to solve numerically the 
stochastic differential equations. If not specified, it is assumed to be variable x 
in each equation. Similarly for the time variable which, by default, is assumed 
to be t. Now we are ready to simulate the process using the generic R function 
simulate in the following way: 

R> set.seed(123) 

R> X <- simulate(model, n = 1000) 

YUIMA: 'delta' (re)defined. 

now x contains the simulated path. Notice that x is not a simple ts object 
but a S4 object of class yuima-data. The internal storage is of class zoo and 
can be extracted from x using the get.zoo.data method. Clearly, other meth¬ 
ods exist, such as, for instance the plot method. The output of plot is given 
in Figure 4.10. 

R> plot(X, plot.type = "single", lty = c(l, 3), ylab = "X") 

4.5.5 Levy processes 

As we have seen in Section 3.18, Levy processes are characterized by the fact that 
they have independent increments. This makes the simulation of such processes 




NUMERICAL METHODS 


179 



Figure 4.10 Simulation of two-dimensional diffusion process using yuima 
package. 


quite easy in some cases when it is possible to simulate random numbers from 
the distribution of the increments. One particular process which will be used in 
Chapter 8 for option pricing is the exponential Levy model 

S, = S 0 e z ‘ 

so that the log-returns of the price process S t , i.e. log(.S' /+ </5' ; ) = Z, hv — Z, are 
distributed as the increments of the Levy process Z f . Thus, in this section we 
show how to simulate several types of Levy processes and in Chapter 5.4 we 
describe some statistical techniques to estimate the parameters of these processes 
from financial data. The yuima package is one of the preferred way to simulate 
one or multidimensional Levy paths but also stochastic differential equations 
with jumps, which we consider separately in the next section. The package can 
simulate Levy processes with the following classes of increments: 

• gamma, bilateral gamma and multivariate normal gamma studied in the 
context of option pricing in Kiichler and Tappe (2008a,b, 2009); 

• inverse Gaussian and multivariate normal inverse Gaussian, originally 
introduced in Tweedie (1947) and studied in the context of finance by, 
e.g. Barndorff-Nielsen and Shepard (2001); 

• univariate stable (possibly skewed) and univariate exponentially tempered 
stable distributions. 

Notice that some processes, like the more famous Variance Gamma process, are 
special cases of the above. For more information on each of the distributions 
it is possible to refer to the manual of the yuima package; here we present 
some example of use. All the above Levy processes can be simulated exactly 
using the Euler approximation. In the yuima the approach used is to transform a 
Wiener process into a pure jump Levy process using the following approach: let 
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Za = Z t+ a — Z r , the increment of the Levy process. Then, it is possible to 
express Za as a normal variance mean mixture with some subordinator ta 

Za ~ rA + TaA/I + (taA)2£ 


where 

• £ is a (/-dimensional Gaussian random variable independent of t a ; 

• P is a d -vector related to the skewness of the distribution; 

• /r is a (/-dimensional vector of location parameters; 

• A is a cl x d symmetric and positive definite matrix which represents the 
matrix of scale parameters of the distribution; 

• (t a A )2 is the positive definite square root of r A A. 

To avoid redundancy, it is necessary to set det(A) = 1 and, in the one¬ 
dimensional case of d — 1, then A = 1 so that the mixture takes the 
form: 

Za ~ /rA + taA/3 + 

For example, consider the multivariate normal gamma distributed increments, 
i.e. 

Z A ~ NT rf (AA, a, /J, fi A, A) 

with X > 0, a > 0, a 2 > /S T A/3, det(A) = 1 and (/-dimensional density 
p{z\ XA, a, ft, fi A, A) 

= e l T fe-M )( a 2 _ ^AP) kA K, A _ d/2 ( aQix ; /xA, A)){Q(z; /xA, A)} XA ~ rf / 2 

r(A.A)^ f, /22^/2+^A-l Q; AA-rf/2 


where Q(z; //A. A) = — /zA) T A _1 (z — /zA) and K, denotes the modified 

Bessel function of the third kind with index X which in R can be obtained with 
besseiK. In this case, the algorithm is as follows: 

• generate ta with rgamma(l, XA, {or — /f T A/3)/2); 

• generate £ with rmvnorm; 

• set Z A = nA + Pt a A^/t a A^; 

then Z A is distributed as NT^fkA, a, ft, /zA, A). In the next section we see 
that these processes can be simulated as special cases of stochastic differential 
equations with jumps. 
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4.5.6 Simulation of stochastic differential equations with jumps 

We have introduced stochastic differential equations with jumps in Section 3.18.7 
of the form: 

dZ, = a(X,)dt + b(X,)dW, + f c(X t _, z)/x(dr, dz) 

\z\ > i 

+ J c(X,-, z){/r(df, dz) - v(dz)df}, 

0<|z|<l 


where /i is the random measure associated with jumps of X, 


dz) = ^ l{AZj^o}^(i,AZj)(dL dz), 

j > o 


and S denotes the Dirac measure. The process Z, is the driving pure-jump Levy 
process of the form: 


*-/7 

do d|z|<l 


z{/Lt(dv, dz) — v(dz)di} + 


/'/ 

70 d |z| > 1 


z[i(ds, dz). 


To some extent, the yuima package covers most of the cases described by the 
above formulas. There are too many aspects which is not possible to discuss 
here and these includes the case of processes with finite or infinite activity, 
pure jump (as seen in the previous section) or compound type specification. The 
reader is invited to follow the developments of the Yuima Project and check the 
latest documentation available. Here we just show how simple it is to simulate 
stochastic differential equations with jumps in two simple situations. For example, 
suppose one wants to simulate Z, which is a Compound Poisson Process (i.e. 
jumps follow some distribution, e.g. Gaussian). Then it is possible to consider 
the following SDE with jumps: 


dX, = a(X,)dt + b(X,)dB, + dZ t . 


Let us set an intensity of k — 10 in the Compound Poisson Process and choose 
standard Gaussian jumps. In this case, it is possible to extend the basic yuima 
model to allow for jumps with compound Poisson specification using the flag cp 
in the argument measure. type. For example, if we want to simulate the process 


dX( — —OXfdt crd Bj Zj 


where 6 and a are some parameters, we proceed as follows: 
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I'm jumping! 



Figure 4.11 Simulation of stochastic differential equations with jumps using the 
compound Poisson specification. 


R> require(yuima) 

R> modCP <- setModel(drift = c("-theta*x") , diffusion = " sigma ", 
+ jump.coeff = "1", measure = list(intensity = "10", 

df = list("dnormfz, 0, 1)")), 

+ measure.type = "CP", solve.variable = "x") 

and if we want to simulate it we simply proceed as in the diffusion case 

R> set.seed(123) 

R> X <- simulate(modCP, true.p - list(theta = 1, sigma = 3), 
n = 1000) 

YUIMA: 'delta' (re)defined, 
which we can plot in the usual way with 

R> plot(X, main = "I'm jumping!") 


and obtain the plot in Figure 4.11. If, instead, we want to simulate specifying 
the Levy increments we use the switch code in argument measure. type. For 
example, suppose we want to simulate a pure jump process with drift but without 
the Gaussian component, e.g. the following Ornstein-Uhlenbeck process of Levy 
type solution to the stochastic differential equation 


dX t — —xdt + dZ,. 


We proceed as follows: 

R> modPJ <- setModel(drift = ”-x", xinit = 1, jump.coeff = 
+ measure.type = "code", measure = listfdf = 

"rIG (z, 1, 0.1)")) 


1 ", 
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YUIMA: Solution variable (lbs) not specified. Trying to 
use state variables. 

R> set.seed(123) 

R> Y <- simulate(modPJ, Terminal = 10, n = 10000) 

YUIMA: 'delta' (re)defined. 

and we can see its simulated path in Figure 4.12. It is also possible to specify the 
jump coefficient c(-) using the argument jump.coeff in the setModei function 
so the complete model of the form dX, — a(X, )d/ + b(X,)dW, + c(X, )dZ, can 
be fully specified. As mentioned, the number of Levy processes which can be 
simulated with the yuima package is growing constantly and it is not worth 
listing the different options here. A nice review of simulation schemes for Levy 
processes can be found in Schoutens (2003) and Cont and Tankov (2004). 

4.5.7 Simulation of Markov switching diffusion processes 

For simplicity in this section we consider the one-dimensional case only. Consider 
the Markov switching diffusion process X solution to the stochastic differential 
equation 


dX, = + g(X r ,a t )dB t , X Q = x 0 , a(0) = a, (4.6) 

where [B t ,t > 0} is the standard Brownian motion and a, is a finite-state Markov 
chain in continuous time with state space S and infinitesimal generator Q — [c/, ; ] 
(see Definition 3.4.17), i.e. the entries of Q are such that q t j > 0 for i ^ j, 
HjzsQij ~ 0 for each i e S. Remember that the transition probability matrix, at 


I'm jumping as well! 



Figure 4.12 Simulation of an Ornstein-Uhlenbeck process of Levy type without 
Gaussian component. 
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any given time instant and lag t is given by the matrix exponential 

P(t) = exp(gr). 

The drift and diffusion coefficient are such that Equation (4.6) has a unique 
solution in distribution for each initial condition. For any e > 0, define a n — a en , 
this sequence {a,,} is called the e-skeleton of the continuous-time Markov chain 
a t (see Chung 1967). It can be shown that a n is a discrete-time Markov 
chain with one-step transition probability matrix 

P = [Pij ] = exp(eg). 

Notice that while P(t) above is time-dependent for a,. the matrix P is not and 
the Markov chain a n has stationary transition probabilities. We use the notation 
At — e according to previous sections. As in the diffusion case, we can apply the 
Euler-Maruyama scheme to Equation (4.6) to obtain the following discretization 

An ■ i — X n T / (x n , a n )At -p g(x n , a n )\fAtZj 

with Zj ~ N(0, 1). This scheme converges weakly to the solution of (4.6) as 
At —» 0 as proved in Yin et al. (2005). Although we did not dedicate a specific 
section for the simulation of Markov chains, we now show a generic algorithm 
which we will use to simulate a„. We keep this function simMarkov separate so 
one can use it individually. 


R> simMarkov <- function (xO, n, x, P) { 

+ mk <- numeric(n + 1) 

+ mk[l] <- xO 

+ state <- whichfx == xO) 

+ for (i in l:n) { 

+ mk[i + 1] <- samplefx, 1, prob - P[state, ]) 

+ state <- which(x == ink[i + 1]) 

+ } 

+ return(ts(mk)) 

+ } 

This function needs an initial state xO, the number of new observations to simulate 
n, a vector of states x representing the state space and a transition matrix p. The 
rows of the transition matrix are associated with the elements of the state space 
vector x. The result is a time series of class ts. Suppose we want to simulate a 
Markov chain with state space S — {1, 2, 3} and transition matrix P 


0.1 

0.1 

0.8 

0.5 

0.2 

0.3 

0.3 

0.3 

0.4 


then we will use the following R code 


R> P <- matrix (c (0.1, 0.5, 0.3, 0.1, 0.2, 0.3, 0.8, 0.3, 0.4), 3, 
+ 3) 
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R> set.seed (123) 

R> X <- simMarkov(1, 10, 1:3, P) 

R> plot(X, type = "s") 

the plot of the trajectory is given in Figure 4.13. For continuous time Markov 
chains one can use the package msm, but we will only use this package to 
calculate the exponential matrix later. Next follows the code to simulate the 
diffusion process with Markov switching in Equation (4.6). This code is not 
very sophisticated, but contains the elementary blocks to construct an efficient 
simulator. 


R> simMSdiff <- functionfxO, aO, S, delta, n, f, g, Q){ 

+ require(msm) 

+ P <- MatrixExp(delta * Q) 

+ alpha <- simMarkov(aO, n, S, P) 

+ x <- numeric(n + 1) 

+ x[l] <- xO 

+ for (i in l:n) { 

+ A <- f(x[i], alpha [i]) * delta 

+ B <- g(x[i], alpha[i]) * sqrt(delta) * rnorm(l) 

+ x[i + 1] <- x[i] + A + B 

+ } 

+ ts(x, deltat = delta, start = 0) 

+ } 

The function simMSdiff has arguments xO the initial value of the process, aO 
the starting value of the Markov chain a,, s the state space of a t , delta the 
discretization step, n the number of new observations to generate, f and g two 
R functions of two arguments of the form f(x, a) and g(x. a) where x is the 
state of X t and a the value of the Markov chain and q the generator of the 
continuous time Markov chain. Notice that the code is very elementary and it is 
up to the user to correctly specify the functions and the matrixes involved. Just 


x 
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Figure 4.13 Example of plot of a Markov chain with three states simulated with 
simMarkov. 
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as an example, we try to simulate a nonlinear Markov switching diffusion (4.6) 
where /(-, ■) and g(-, ■) are as follows: 


fix, a) = 


1 (1 + sin(x))x, 
(1 — cos(x))jc, 


if a — 0, 
if a — 1, 


g(x,a) 


I cr 0 ■ x — 0.5 ■ x, if a — 0, 

cTj ■ x — 0.2 ■ x, if a = 1. 


and the generator Q is the following matrix: 


Q = 


-6 

12 


6 

-12 


with an initial values (Xo, «o) = (5, 0). We set the discretization step A = I /n 
with n = 1000. So, we proceed as follows: 

R> Q <- matrix(c(-6, 12, 6, -12), 2, 2) 

R> n <- 1000 
R> sO <- 0.5 
R> si <- 0.2 

R> f <- functionfx, a) ifelse(a == 0, (1 + sin(x)) * x, 

(1 - cos (x)) * 

+ x) 

R> g <- functionfx, a) ifelse(a == 0, sO * x, si * x) 

R> set.seed(123) 

R> X <- simMSdiff(xO = 5, aO = 0, S = 0:1, delta = 1/n, n = 1000, 

+ f. g, Q) 

R> plot (X) 


The simulated path is shown in Figure 4.14. 



Figure 4.14 Example of trajectory of a Markov switching diffusion process using 
simMSdiff. 
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Solution 4.1 (to Exercise 4.1) Remind that the equation of the semi-circle is 
g{x) = s/l-X 2 , x € [—1, 1]. Given that the area of the circle is Jt * r 2 where r 
is the radius (r — 1 in this case), we can calculate jt as jt —2 * A * S n /n where 
S n and A are as in Section 4.1.1. Then 


R> 

se 


. seed(123) 



R> 

g 

< 

- function (x) 

sqrt 

R> 

a 

< 

- -1 



R> 

b 

< 

- +1 



R> 

c 

< 

- 0 



R> 

d 

< 

- 2 



R> 

A 

< 

- (b - a) * (d - 

C. 

R> 

n 

< 

- le+05 



R> 

X 

< 

- runif(n, a. 

b) 


R> 

y 

< 

- runif(n, c, 

d) 


R> 

2 

* 

A * sum(y<g(x)) 

/n 

[1] 

3 

.14272 




R> 2 * integrate(g, a, b)$value 
[1] 3.141593 
R> pi 

[1] 3.141593 


4.7 Bibliographical notes 

Simulation for stochastic differential equations driven by Brownian motion can 
be found in the classical book of Kloden and Platen (1999), Kloden el al. (2000) 
and, in view of inference for these models, in Iacus (2008). For jump processes the 
literature is sparse, but we can mention at least Platen and Bruti-Liberati (2010) 
which consider a general approach to simulation of jump processes with a view 
on finance applications; the monographs of Cont and Tankov (2004), Schoutens 
(2003), among the others, also contain section on simulation for these processes. 
For point processes one can found elements in Ripley (2006) and Ross (2006). 
Books on numerical methods with finance in view are, for example, Jackel (2002) 
and Glasserman (2004). 
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5 


Estimation of stochastic 
models for finance 


In this chapter we review some of the basics models used to model asset prices, or 
market indexes, interest rates, etc. These processes may have different structures 
and we start from the very fundamental model of asset dynamics of the Black 
and Scholes (1973) and Merton (1973) model. 

5.1 Geometric Brownian motion 

Let us denote by {S', = S(t), t > 0} a stochastic process which represents the 
value of an asset (the asset price) at time t > 0. The process S is called geo¬ 
metric Brownian motion process if it is the solution to the following stochastic 
differential equation: 


dS t = ttS,dt + aS,dW t 


(5.1) 


with some initial value So, and some constants /x, o > 0. As seen in Chapter 1, 
the fundamental idea was to describe the returns of S, in the interval [t,t + dr) 
in terms of two components: 


S,+ dt - S, 

s, 


d S t 


= deterministic contribution + stochastic contribution 


where the deterministic contribution is assumed to be proportional to time, 
i.e. /rdf, and the stochastic part is assumed to be of Gaussian type, i.e. crdW,. 
A simple rewriting of 

dS, 

- = /idt + erd W t , 

St 
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gives the stochastic differential Equation (5.1). The constant fi represent the drift 
of the process and a 2 is called volatility. We will see that, in more advanced 
models, but fi and a can be deterministic functions depending on the time t or 
the state of the process S, or both or some other kind of stochastic processes 
such that the solution of the corresponding stochastic differential equation exists. 
Exercise 3.14 proves that 



cr(s)dB s 


X, — Xq exp 


is the solution of the stochastic differential equation 

dZ, = b(t)X,dt + a(t)X,dB, 

of which Equation (5.1) is just a particular case. So, the explicit solution of the 
geometric Brownian motion is 


S, — Sq exp {at + a B ,} 


where a = fx — \a 2 . Notice that, if a — 0, i.e. absence of the stochastic noise, 
Equation (5.1) becomes the simple ordinary differential equation 

ds, = /is,At 

which we can easily solve by first rewriting it as follows: 

ds, 

dT = # "' 

which, in the limit as df —> 0, is equivalent to writing: 

d 


—s, = iis, 


or 


s' t d 

- = -T-Jog s, = fi 
s t dt 


whose solution is 



o 


and, after integration, we get 


s, — s 0 exp{/rt). 
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The main difference between the stochastic 

S, — So exp | fit — -cr 2 f + a ^t 

and deterministic version is not only the stochastic term oB, but also the com¬ 
pensating factor \cr 2 t which is the contribution of the Ito formula and makes the 
stochastic version S t a well-behaved stochastic process. 

Two hypotheses which are always implicit in the Black and Scholes (1973) 
and Merton (1973) model are the following: 

(i) the past history is entirely reflected in the present value of the asset and 
does not contain information from the future; 

(ii) the market reacts immediately to each new information on the asset. 

Hence, what can be modeled is the effect of new information on the asset price, 
i.e. the increments of the price process or the innovations. Indeed, in the previ¬ 
ous derivation of the geometric Brownian motion we impose Gaussian returns. 
Processes are usually supposed to be Markovian as diffusion processes are. 


5.1.1 Properties of the increments 

From the explicit solution of S t it is clear that the geometric Brownian motion 
is nothing but the exponential of Brownian motion which in turn, for each t, is 
a Gaussian random variable. In general, if X ~ N and c is some constant, then 
ce x has log-normal distribution, which means that log X ~ N. For this reason, 
the geometric Brownian motion is well suited for financial applications where 
log-returns are more common that simple returns. To see this, let us denote by 
Si — Sfa) the asset price at time /, e [0, T ]. with T some fixed time horizon, 
i () = () < /] < ?2 < • • • < G-t <t„ = T. The returns Y, can be simply written as 


Yi = 


Sj - Sj -i 

Si -1 


Si 

Si -1 


- 1 , 


1 , 2 ,..., n. 


Now, consider the Taylor expansion of log(l + z) 

log(l + z) = z - -z 2 + o(z 2 ) — z 

if z is very small. Now, let X,- — 1 + Z, — 5)/S;_i = 1 + Y,-, i.e. Z, — Y, . Then 


log X t - log(l + Zj) = log(l -FT*) - Yi 
which means that the true returns T,- are almost identical to the log-returns 
Xj — log Sj — log , i — 1,2 
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These new values have very appealing properties. Let us denote with A t — 
tj — ti-i, i — \..... n, the time increment between two observations (usually 
At — 1 /252 if the time reference is the year and 5) represent daily data). The 
log-returns X, are additive, indeed 


Xj + ■ ■ ■ + Xj +n _i — log 


Si+n — 1 

Si-1 


— log Si -\. n —i log S{ _ i 


while the T, are not additive. Now, given that S, is a geometric Brownian motion, 
we have that 


X,- = X(ti) = log 


/ S(t,) \ 
\S(ti- 1)J 


= Ot At + cr(B(tj) - B(tj- 1 )) 


a At + ctn/a7N(0, 1) 


therefore 

Xi ~ N(a At, o 2 At). 

Moreover, X(t,) is independent of X(tj) for i ^ j because they depend on the 
increments of the Brownian motion on disjoint intervals. 


5.1.2 Estimation of the parameters 

So if we consider the values X,, i = \..... n, they constitute a sample of i.i.d. 
random variables with common distribution N(orAr, cr 2 At). So, from Chapter 2 
we know how to obtain unbiased and efficient estimators of the mean and the 
variance of a sample of i.i.d. random variables. For example, X n — - Y^'!= i %i 
is an estimator of a At and — Y^=i (^/ — Xn)~ is an estimator of a 2 At, thus 


. _ 1 1 

~ A t„ ~ At 


i =1 


1 1 A , - , 2 S 2 

i=i 


At n 


from which we obtain immediately the following estimator of /i 

1 „ 2 

Bn = Ot„ + ~<y n ■ 

In the next code we simulate a trajectory of the geometric Brownian motion and 
we fit the parameters in an elementary way. To this end, we use the sde.sim 
function specifying the option bs in the argument model, with a vector of param¬ 
eters theta = (// — 1, a — 0.5), over a time interval [0, T — 100], with 10000 
equally spaced observations. 
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R> require(sde) 

R> set.seed(123) 

R> sigma <- 0.5 
R> mu <- 1 

R> S <- sde.sim(X0 = 100, model = "BS", theta = c(mu, sigma), 

T = 100, 

+ N = 10000) 

We then create the log-returns using diff (log(S)) and estimate the parameters 
R> X <- diff(log(S)) 

R> sigma.hat <- sqrt(var(X)/deltat(S)) 

R> alpha.hat <- mean(X)/deltat(S) 

R> mu.hat <- alpha.hat + 0.5 * sigma.hat*2 
R> sigma.hat 

[ 1 ] 0.4993183 

R> mu.hat 

[ 1 ] 0.9878009 

5.2 Quasi-maximum likelihood estimation 

For diffusion process solutions of stochastic differential equations and observed 
at discrete times, it is possible to define the likelihood function making use of 
the Markov property. We consider multidimensional diffusion process and then 
particularize it to the one-dimensional case. Let X be an m -dimensional diffusion 
process solution to 

dX t =b(X t ,0)dt + a(X,,6)dB t , t e [0, T], (5.2) 

where B , is an r-dimensional standard Brownian motion, b(x , 6) is an m-valued 
function, and a(x,0) is an m x r matrix valued function. The parameter 6 
belongs to © which is a bounded domain in W 1 , d > 1. Let {/,, / = 0, 1,...,«} 
be a grid of times such that to — 0 and t n — T and let us denote X ti by A,-, 
i — 0, 1We can construct the likelihood function L n {6) of the Markovian 
process X using the conditional distributions, i.e. p$ (A, x\y) is the conditional 
density of X, given X,_i = y with A, = /, — t,_[. Indeed, we can write 

n 

LAO) = n Pe (A Xi\Xi^) PAX o), (5.3) 

(=t 

where po(X o) is the distribution of the initial value Xq. For example, if X 
is the one-dimensional geometric Brownian motion, then 6 = (p. rr) and the 
conditional density is the log-Normal distribution, while the initial condition can 
be taken as nonrandom so pg(X o) = 1. In general, the initial distribution pg(X o) 
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is either discarded or taken as equal to one in the asymptotic theory. This is due 
to the fact that it is usually impossible to estimate it from the data and, given the 
high number of observations in the asymptotic framework, it is assumed that the 
initial distribution gives a marginal and small contribution to the full likelihood 
L n {6). As usual, we denote by t n (0) — log L„(0) the log-likelihood function 

n 

l„(0) = log L n {6) = Y J ^) + \og{p e {X 0 )) 

i=l 

n 

= ^log pe (A if Xi |X,_i) + \og(pe(X 0 )). 

i =1 

and, from now on, we take, without any further comment, pe(X o) = 1, thus 

n 

4 (9) = ^2 log p 9 (A,, . (5.4) 

i=i 

When the transition densities pe (,v, x\y) are known in explicit form, it is possible 
to proceed with exact likelihood inference although the results depend, clearly, 
on the regularity of the drift and diffusion coefficients, but also on the asymptotic 
scheme. Unfortunately, the likelihood function is rarely known in explicit form 
and, when known it is usually for the one-dimensional case. In any case, when the 
model is regular, the MLE based on the exact likelihood function possesses 
the usual good properties but we need to distinguish the case of parameters in the 
drift and the diffusion coefficient. For the parameters in the diffusion coefficient, 
the rate of convergence is JTi in all asymptotic schemes. In this case, the MLE 
is also asymptotically efficient as in the i.i.d. case. In case of the drift, the rate 
of convergence of (any) estimator is of order \ff and, if T is not large enough 
or is fixed, it is impossible to obtain consistent estimates of the drift parameters. 
Luckily enough, the most important quantity in finance is volatility rather than 
the trend. Usually the grid of time points is such that n A n — T, where A„ is 
taken equal for all time points, i.e. — fi-i = A„ and I, = / A„, i = 0, 1,..., n. 
In this case, the rate of convergence of the drift is written as *Jn A„ and hence, 
it is required that « A„ = T -> oo in order to get consistent estimators of the 
parameters along with some conditions on the stationarity or ergodicity of the 
process X. Here we present only quasi-maximum likelihood estimation properties 
for two different asymptotic schemes. For more details on other approaches like 
estimating functions, method of the moments, etc., the reader can consult Iacus 
(2008), Phillips and Yu (2009) and Sqrensen (2009). Let us consider A„ —> 0, 
n A = T —> oo as n —> oo. Consider the Euler-Maruyama discretization (4.5) for 
the time-homogenous stochastic differential equation (5.2). Then, the conditional 
distribution pe (A„, Z,|Z,_i) is a multivariate Gaussian random variable with 
mean Z,_i + />(?,■_i, Z,_i)A„ and variance-covariance matrix A n S(x,6) with 
S(x, 9) — o{x, #)® 2 . Here, for a matrix A, we denote by A J the transpose of 
A, TrM) the trace of A, A® 2 — A ■ A J and by A~ l the inverse of A. Then, 
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the corresponding approximated log-likelihood function for the Euler-Maruyama 
scheme takes the form: 

n 

Lie) = ^logdetS(Y ; _i,0) + A-'SCYi-j.er'KAX,- -fcfo-i.Xi-O)® 2 ] 
(=1 

where A Xj — X, — Z,_i and we have dropped the terms with the constants j 
and s/2n. The Quasi-maximum likelihood estimator is the solution of 

§ n — argmin£„(0). 


In order to get good asymptotic properties for 9 n it is also required that n A 2 —»• 0 
and several other regularity properties which we do not mention here but can 
be found in the above mentioned references. The only necessary condition to 
explicit is that it is needed to require the separation of the vector 6 into two 
subvectors, i.e. 6 = (a, p), with a e @ p and p e @ q such that @ p x @ c/ — 0. 
Let us denote by fie(-) the invariant measure of the process and define the Fisher 
information matrix for this experiment as 


W) 


_ ( I., 


u%mkj= t, 


where 


4>) = / 

T* j (B) = 2 f 
° J a 2 (fi. 


1 db(a, x ) 3 b(a, x) 
cr 2 (P,x) dak da j 


da 

dcr(jB, x) dcr(P, x) 


x) d p k 


dp. 


lie (dx), 
lie(dx). 


Consider the matrix 


<p(n) = 


l 


In 0 


0 


& 


where l p and \ q are respectively the identity matrix of order p and q. Then, is 
possible to show that 


Theorem 5.2.1 (Ergodic case) 


<p(nr l/2 (O n -e)S Nioaier 1 ). 


The proof of this theorem can be found, e.g., in Yoshida (1992), where the author 
also presented milder hypotheses and showed that, given that the estimator of 
P has a faster rate of convergence, one can first estimate p with P„ with a 
partial quasi-likelihood function and the plug p n in the above quasi-likelihood 
function and estimate a. In the case when T is fixed and hence A„ —> 0 but n A„ 
does not diverge, then the asymptotic theory changes and, in particular, only 
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the parameter p in the volatility can be estimated efficiently. In this case, there 
is no need to assume ergodicity of the process X but the problem is now that 
the Fisher information matrix is a random matrix and the limit theorems are of 
mixed-normal type. In this case, we have this limit theorem for the quasi-MLE, 
with all parameters 0 in the diffusion coefficient 

Theorem 5.2.2 (Pure high frequency) 

yfi0n-e)^r d N(O,I d ) 

with I,/ a d x d identity matrix and V,i a random matrix. The asymptotic result is 
based on the stable convergence Jacod (1997, 2002). There are several versions 
of this results proved under different regularity conditions and one milestone 
reference is surely Genon-Catalot and Jacod (1993). 

From the numerical point of view, the two cases are equivalent in that there is 
only the need to maximize the quasi-likelihood. The difference is in the estimation 
of the standard errors of the estimates. But these are usually obtained by numer¬ 
ical approximation of the hessian matrix. The package yuima implements the 
quasi-likelihood method for multidimensional diffusion processes and the pack¬ 
age sde implements exact, quasi-likelihood and other approximation schemes for 
the one-dimensional case. In some cases, when e.g. the A between observations 
is not so small and the diffusion process is one-dimensional, the use is preferred 
of exact methods and other approximation schemes based on asymptotic expan¬ 
sions which are available through the sde package. In the multidimensional case, 
there is no other option than the yuima package at present. Here we present an 
example based on the function qmie for quasi maximum likelihood estimation 
of multidimensional diffusion processes. 


R> require(yuima) 

R> diff.matrix <- matrix(c("alphal”, "alpha2", "1", "1"), 2, 2) 

R> drift.c <- c("-1*betal*xl", "-1*beta2*x2", "~l*beta2", 

"-1*betal") 

R> drift.matrix <- matrix(drift.c, 2, 2) 

R> ymodel <- setModel(drift = drift.matrix, diffusion = 
diff.matrix, 

+ time.variable = "t", state.variable = c("xl", "x2”), 

solve.variable = c("xl", 

+ "x2")) 

R> n <- 100 

R> ysamp <- setSampling(Terminal = (n) A (1/3), n = n) 

R> yuima <- setYuima(model = ymodel, sampling = ysamp) 

R> set.seed(123) 

R> truep <- list(alphal = 0.5, alpha2 = 0.3, betal = 0.6, 
beta2 = 0.2) 

R> yuima <- simulate(yuima, xinit = c(l, 1), 
true.parameter = truep) 

R> opt <- qmie(yuima, start = list(alphal = 0.7, 
alpha2 = 0.2, betal = 0.8, 

+ beta2 = 0.2)) 
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and now opt is an object of class mie and we can extract the estimates of our 
parameters 

R> opt@coef 

alphal alpha2 betal beta2 

0.3842415 0.1950523 0.5899493 0.1670455 

R> unlist(truep) 

alphal alpha2 betal beta2 
0.5 0.3 0.6 0.2 

5.3 Short-term interest rates models 

In the option pricing formulas of the standard Black and Scholes (1973) and 
Merton (1973) model, which we discuss in Chapter 6, the short-term interest rate 
is assumed to be constant. But, for example, if we look at Figure 5.1 we see 
that the U.S. interest rates are clearly changing over time. Thus, it makes sense 
to try to model interest rates using stochastic processes like diffusion processes 
solutions to differential equations. Figure 5.1 is obtained using the R package 
Ecdat and the following code 

R> library(Ecdat) 

R> library(sde) 

R> data(Irates) 

R> X <- Irates[, "rl"] 

R> plot (X) 


Here, we briefly introduce the most common known models and their properties. 
We start with the general Chan et al. (1992) model (CKLS) which encompasses 



Figure 5.1 The U.S. Interest Rates monthly data from 06/1964 to 12/1989. 
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many other sub models presented in the literature (see Table 5.1). The CKLS 
process is the solution to the following stochastic differential equation: 

dZ, = (a + px t )d t + crXfdB,. 

The parameters must satisfy the constraints a > 0, y > 0 and a, fi may, in prin¬ 
ciple, take any real value, but for meaningful applications, the process must be 
non-negative. This process is non-negative when a > 0 and f J > < 0 and y > 2 and 
the initial condition Zo > 0. Processes of this type may possess the mean revert¬ 
ing property when they tend to oscillate around some mean level represented by 
the parameter a. The speed at which the process tends toward the mean level is 
the speed of mean reversion, and this is captured by the parameter p. Table 5.2 
shows conditions on the parameters and the property of mean reversion for the 
different sub-models. 


Table 5.1 The family of one-factor short-term interest rates models seen as 
special cases of the general CKLS model. 


Reference 

Model 

a 

p 

Y 

Merton (1973) 

dZ r = adt + odB, 


0 

0 

Vasicek (1977) 

dZ, = (a + pX,)dt + crdB, 



0 

Cox et al. (1985) 

dX t = (a + pX,)dt + cr^pCtdB, 



1/2 

Dothan (1978) 

dZ, = oX,dB, 

0 

0 

1 

Geometric Brownian Motion 

dX, = px t dt + oX,dB, 

0 


1 

Brennan and Schwartz (1980) 

dX, = (a + pX,)dt + oX t dB, 



1 

Cox et al. (1980) 

dX, = oX] n ~dB, 

0 

0 

3/2 

Constant Elasticity Variance 

dZ r = pX t dt + aX y t dB t 

0 



Chan et al. (1992) 

dZ, = (a + pX,)dt + crXjdB, 





Table 5.2 Mean reversion property of the members in the family of CKLS 
model. 


Model 

a 

P 

Y 

mean reverting? 

Merton (1973) 

R 

0 

0 

no 

Vasicek (1977) 

R 

R 

0 

yes 

Cox et al. (1985) 

R 

R 

1/2 

yes 

Dothan (1978) 

0 

0 

1 

no 

Geometric Brownian Motion 

0 

R 

1 

yes 

Brennan and Schwartz (1980) 

R 

R 

1 

yes 

Cox et al. (1980) 

0 

0 

3/2 

no 

Constant Elasticity Variance 

0 

R 

R 

yes 








ESTIMATION OF STOCHASTIC MODELS FOR FINANCE 


201 


5.3.1 The special case of the CIR model 

Introduced in Cox et al. (1985), this is one of the few examples in the class CKLS 
for which the transition density is known and hence exact maximum likelihood 
estimation is possible. Indeed, the conditional density of X t +&\X t — x for the 
Cox-Ingersoll-Ross (CIR) process solution of 

dX, = (a + /3X,)dt + a^X~,dB, 

is a noncentral x 2 distribution, p < 0, o > 0. In particular, the transition density 
pg(A,y\x) can be rewritten in terms of the transition density of Y, = 2cX ,, 
which has a/ 2 distribution with v — 4ct/cr 2 degrees of freedom and noncentrality 
parameter Y s e ^‘, where c — —2fi/(cr 2 (l — e^ 1 )). The process also has an explicit 
solution of the form: 

X, = (x 0 + °^j e pt + 6 3 e pt J' e-^y/x^d B u . 

and, when it exists, the stationary distribution of the CIR process is a Gamma law 
with shape parameter 2 a/a 2 and scale parameter —a 2 /2p. Hence the stationary 
law has mean equal to —j and variance aa 2 /{2 p 2 ). The sde package implements 
[rpdq] cCir and [rpdq] sCir for random number generation, cumulative distri¬ 
bution function, density function, and quantile calculations, respectively, for the 
conditional and stationary laws. It also provides an explicit simulation scheme 
in the sde.sim function. One should know that the parametrization in the sde 
package is the following: 


dx, = (0! - e 2 x t )dt + e 3 yfx,dB t . 


Thus, for example, if we want to simulate the process with 0 = (1, 0.3, .1) we 
should use 

R> sde.sim (model = "CIR", theta = c(l, 0.3, 0.1)) 

If we want to fit this model for the U.S. Interest Rates monthly data above, we 
can use exact maximum likelihood estimation. Thus, we can construct the true 
likelihood with 

R> CIR.loglik <- function(thetal, theta2, theta3) { 

+ n <- length(X) 

+ dt <- deltat(X) 

+ -sum(dcCIR(x = X[-l], Dt = dt, xO = X[-n], theta = c(thetal, 

+ theta2, theta3), log = TRUE)) 

+ } 
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and then use the standard mie function to estimate the parameters 

R> fit <- mle (CIR. loglik, start = listfthetal = 0.1, theta2 = 0.1, 
+ theta3 = 0.3), method = "L-BFGS-B", lower = rep (0.001, 3), 

+ upper = rep(1, 3)) 

R> coef(fit) 

thetal theta2 theta3 
0.9194592 0.1654958 0.8255179 

It is also possible to estimate the full CKLS model using the quasi-maximum 
likelihood estimator. In Section 9.3 we discuss a model selection strategy for the 
parameters of the CKLS model making use of the yuima package. 

5.3.2 Ahn-Gao model 

The CKLS model introduces the mean reverting property but still assumes a linear 
drift. The literature have proposed several alternative models which admit also 
nonlinear drift. One example is given by the process solution to the stochastic 
differential equation 


3 

AX, = X, (0i - (dl - 9\6 2 )X,)At + 9 2 X}AB,. 


The conditional distribution of this process is also known and it reads as 

p e (t, y\xo) = -4 Pg ,R (t, - —) , 

y V y W 

where p^j IR is the conditional density of the CIR model of previous section. It is 
left as an exercise to construct the maximum likelihood estimator for this model. 

Exercise 5.1 Write the Ft code to obtain maximum likelihood estimator of the 
Ahn-Gao model. 


5.3.3 Ai't-Sahalia model 

Later Ai't-Sahalia (1996a, b) proposed a more sophisticated model to include 
other polynomial terms. The simplest version of Ai't-Sahalia Model is a diffusion 
process solution to the stochastic differential equation 

AX, = (a_ i Xf 1 +a 0 + a l X,+ a 2 Xj)At + ftXfAB, 

which is further generalized into 


AX, — (ct-\X t 1 + oiq + <x.\X, + a 2 X~)At + y fio + f\X, + f> 2 X^AB,. 
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Unfortunately, there are natural but complex constraints that the set of coefficients 
in the drift and diffusion coefficient must satisfy in order to have a well-defined 
stochastic differential equation. We refer the reader to the original paper. Suppose 
these constraints are in place. Then the problem is how to estimate this highly 
parametrized model. One approach is to use quasi-maximum likelihood estima¬ 
tion with the LASSO method as in Section 9.3. Another approach is to use the 
two stage least squares approach in the following manner. First estimate the drift 
coefficients with a simple linear regression like 

E(X,+i - X,\X t ) = («_!X" 1 + a 0 + (a, - l)X, + a 2 Xj 

then, regress the residuals e r 2 +1 from the first regression to obtain the estimates 
of the coefficients in the diffusion term with 

E(e? +l \X t ) = fa + PiX t + p 2 X? 3 

and finally, use the fitted values from the last regression to set the weights in the 
second stage regression for the drift. Here is an example of R code to obtain the 
two stage least squares regression as proposed in Ai't-Sahalia (1996b). 


R> library(Ecdat) 

R> data(Irates) 

R> X <- Irates[, "rl"] 

R> Y <- X[-l] 

R> stagel <- lm(diff (X) ~ I(l/Y) + Y + I(Y / '2)) 

R> coef(stagel) 

(Intercept) I(l/Y) Y I(Y^2) 

0.253478884 -0.135520883 -0.094028341 0.007892979 

R> eps2 <- residuals(stagel)*2 

R> mod <- nls(eps2 ~ bO + bl * Y + b2 * Y*b3, start = list (bO = 1, 
+ bl = 1, b2 = 1, b3 = 0.5), lower = rep(le-05, 4), 
upper = rep(2, 

+ 4), algorithm = "port") 

R> w <- predict(mod) 

R> stage2 <- lmfdiff(X) ~ I(l/Y) + Y + I(Y*2), weights = 1/w) 

R> coef(stage2) 

(Intercept) I(l/Y) Y I(Y^2) 

0.189414678 -0.104711262 -0.073227254 0.006442481 

R> coef(mod) 

bO bl b2 b3 

0.00001000 0.09347354 0.00001000 0.00001000 

These estimates are biased because the empirical conditional moments of 
the discretized stochastic differential equations do not coincide with the real 
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conditional moments, but they can still provide a set of initial values to pass 
to the quasi-maximum likelihood optimizer although it is very unlikely to get 
satisfying solutions in such high dimensions by most algorithms. 

R> summary(stage2) 

Call: 

lm(formula = diff(X) ~ I(l/Y) + Y + I(Y / '2), weights = 1/w) 
Residuals: 

Min IQ Median 3Q Max 

-4.86672 -0.25629 0.04789 0.40901 2.74668 

Coefficients: 

Estimate Std. Error 
(Intercept) 0.189415 0.063891 

I(1/Y) -0.104711 0.026529 

Y -0.073227 0.025867 

I(Y A 2) 0.006442 0.002249 

Signif. codes: 0 '***' 0.001 '**' 0.01 0.05 0.1 ' ' 1 

Residual standard error: 0.8019 on 526 degrees of freedom 
Multiple R-squared: 0.03467, Adjusted R-squared: 0.02917 
F-statistic: 6.298 on 3 and 526 DF, p-value: 0.0003339 


t value Pr(>|t|) 

2.965 0.00317 ** 

-3.947 8.99e-05 *** 
-2.831 0.00482 ** 

2.865 0.00434 ** 


R> summary(mod) 

Formula: eps2 ~ bO + bl * Y + b2 * Y^bS 


Parameters: 



Estimate 

Std. Error 

t value 

Pr (> 111) 

b0 

0.00001 

151.78899 

6.59e-08 

1 

bl 

0.09347 

0.01940 

4.818 

1.9e-06 

b2 

0.00001 

151.77379 

6.59e-08 

1 

b3 

0.00001 

588.06245 

1.70e-08 

1 




Signif. codes: 0'***' 0.001 '**' 0.01 0.05 0.1 ' ' 1 


Residual standard error: 1.305 on 526 degrees of freedom 


Algorithm "port", convergence message: relative convergence (4) 


From the above we see that the two stage approach selects the model 
dX t — (a_ \X t 1 + Q!o + o/.\X t + a.2X~)&t + X t dB t 


and we can consider estimating the parameters further by quasi-maximum like¬ 
lihood estimation. 
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5.4 Exponential Levy model 

As seen in Section 4.5.5 the common form of Levy processes studied in empirical 
finance is the exponential Levy process 

S t = S 0 e Zt (5.5) 

so that the log-returns of the price process S t , i.e. log(.S' /+v /5 , / ) = Z, +s — Z, are 
distributed as the increments of the Levy process Z t . The package fBasics offers 
several routines to fit the parameters of several distributions commonly adopted in 
finance. For example, the nigFit for the Normal Inverse Gaussian distribution; 
hypFit for the hyperbolic distribution; ghFit for the generalized hyperbolic 
distribution; stabieFit for the stable distribution and others. The idea is to take 
the log-returns of the stochastic process S t and fit one of these distributions. 
As an example, we try to fit the Gaussian, the Normal Inverse Gaussian, the 
hyperbolic and the generalized hyperbolic distribution to the ENI.MI prices for 
the year 2009. 

R> require(fImport) 

R> data <- yahooSeries("ENI.MI", from = "2009-01-01", 
to = "2009-12-31") 

R> S <- data[, "ENI.MI.Close"] 

R> X <- returns(S) 

The time series and the returns are depicted in Figure 5.2 via the quantmod 
package using these commands 


R> require(quantmod) 

R> lineChart(S, layout = NULL, theme = "white") 
R> lineChart(X, layout - NULL, theme = "white") 


Figure 5.3 shows the plot of the estimated densities on the logarithmic scale. 
We can see that both the Gaussian model (which means the geometric Brownian 
motion process) and the generalized hyperbolic distribution provide a very bad 
fit, while the Normal inverse Gaussian and the hyperbolic distribution provide a 
similar fit. 

R> library(fBasics) 

R> nFi t (X) 

R> nigFit(X, trace = FALSE) 

R> hypFit(X, trace - FALSE) 

R> ghFit(X, trace = FALSE) 


5.4.1 Examples of Levy models in finance 

As seen, log-returns of the exponential Levy process allow us to estimate 
the parameters of the underlying distribution of Z\. Flere we present a small 
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S [2009-01-02 09:00:00/2009-12-30 09:00:00] 




Jan 02 Mar 02 May 04 Jul 01 Sep 01 Nov 02 Dec 30 
2009 2009 2009 2009 2009 2009 2009 


X [2009-01-05 09:00:00/2009-12-30 09:00:00] 



Jan 05 Mar 02 May 04 Jul 01 Sep 01 Nov 02 Dec 30 
2009 2009 2009 2009 2009 2009 2009 


Figure 5.2 Asset prices and log-returns of the time series ENI.MI for the year 
2009. 
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Figure 5.3 Several fitted densities on the logarithm scale for the series of returns 
of the ENI.MI title of Figure 5.2. 


collection of Levy processes Z, which are often used in finance to construct 
the exponential Levy process in (5.5), although we have anticipated, in the 
previous section, ways to estimate their parameters. We present their canonical 
decomposition as well as their characteristic functions (when not enumerated in 
Section 2.3.3). 
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5.4.1.1 Generalized hyperbolic process 

Making use of the GH distribution (2.13), we can construct a Levy process Z, 
such that Z\ ~ GH with Ito-Levy decomposition 


Z, 


= rEZi + f ( x (/z z — v GH ) (ds, dr) 

Jo Jr 


where v GH (■) is the following Levy measure 


v GH (dx) 



exp | -y/2y + a 2 |x| 


* 2 y\jfaSjZy) + Y? kl (SjTy)\ 


dy + Xe " |jc| l 


L>0} 


dx 


with J, and Y, denote the Bessel functions of the first and the second kind with 
index X (see, (Abramowitz and Stegun 1964)). Its Levy triplet is (E Z\, 0, v GH ). 
For more details on this process see also Raible (2000). 


5.4.1.2 Tempered stable process 

This process is such that Z\ ~ TS (k, a, /3), the tempered stable distribution with 
characteristic function 

< ■pz l (t) = E e ,tZl — exp ja/l — a — 2 it^j 
with density not available in explicit form. This law is such that 
EZ] = 2 akfi —, VarZi = 4ak(l — k)P~ . 

The process has Ito-Levy decomposition 


Z t — tEZi + f f x (/r z — u r5 ) (di, d.r) 
Jo Jr 


where v TS (-) is the following Levy measure 


v rs (dt) = a2 K 


r(i -k) 


x~ k ~ l exp“2^ Ijc 1 


{jc > 0} 


dx 


Its Levy triplet is given by (EZj, 0, u ,s ). The stable distribution was introduced 
in Tweedie (1984) and extended later in many directions in Bamdorff-Nielsen 
and Shepard (2001). 
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5.4.1.3 Normal inverse Gaussian process 

In this case the Levy process Z, is such that Z\ ~ NIG from (2.12), with Ito-Levy 
decomposition 

Z, — fEZ] + f f x (/ i z — v NIG ) (ds, d.v) 

Jo Jr 

where v GH (■) is the following Levy measure 


v* /G (djc) = e Px — K x {cc\x\)Ax. 


Its Levy triplet is given by (EZi, 0, v NIG ). See Barndorff-Nielsen (1997). 

5.4.1.4 Meixner process 

The Meixner process with Z\ ~ Meixner as in (2.14), has the following Levy 
measure 


e a 


.Meixner 


(ck) = 8 


v 


x sinh (^) 


with Ito-Levy decomposition 



Meixner 


) (ds, dx) 


— v 


and triplet (EZi, 0, v Melxner 'y See Schoutens (2003) and Schoutens and Teugels 
(1998). 

5.4.1.5 Variance Gamma process 

The Variance Gamma is a pure jump process with characteristic function of Z\ 
in the form: 



The parameters 9 and k control respectively the skewness and the kurtosis of the 
increments of Z t . This process has Ito-Levy decomposition 



and triplet (EZi, 0, v VG ) with Levy measure 
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The first two moments of the Variance Gamma process are 

EZ, = Gt, VarZ, = (2 d 3 k 2 + 3cr 2 0k)t. 

This process was introduced in Madan and Seneta (1990) and further extended 
in Madan et al. (1998). 


5.4.1.6 CGMY process 


This process, called CGMY from the authors Carr et al. (2002) is also called 
generalized tempered stable process. The characteristic function of Z\ has the 
form: 

<p Zl (t) = Ee itZl = exp (CT(-Y) ((M - itf + (G + it) Y - M Y - G y )} 


where C > 0, G > 0, M > 0 and Y < 2, but its density is not known in closed 
form. Its Levy measure is given by 

e ~Mx gGx 

v CGMY (dx) = C-^ w \ [x>0] Ax + Cj-^ T¥ l {x<0] dx. 

This process has Ito-Levy decomposition 


Z, = tEZi + 



- v CGMY ) (dj, dx) 


and triplet (EZj, 0, v CGMY y See Cont and Tankov (2004) for additional details. 
This model contains, as special cases, the variance gamma and the bilateral 
gamma processes. The distribution of Z\ has finite moments of all orders. 


5.4.1.7 Merton process 


Merton (1976) introduced one of the first jump models in finance. We start from 
its Ito-Lcvy decomposition which reads 

N, 

Z, — fit + er Wf + y Jk 

k=l 


with Jk ~ N(/x/, crj), k — 1,... The characteristic function of Z\ is then 


<p Zl (t) = EE itZ1 


= exp 



a 2 ! 2 


+ A. 




(5.6) 


with Levy triplet (/r, a 2 , X x f j ), with f j (■) the density of and X the intensity 
of the Poisson process N t , t >0. The density of Z\ is not known in closed form 
but it is possible to obtain 

EZi = fi + Xp,j , VarZ! — o 2 + X/x 2 j + Xa 2 


from the characteristic function (5.6). 
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5.4.1.8 Kou process 

The Kou (2002) process replaces the symmetric distribution of the jumps J K 
in the Merton model with an asymmetric double exponential distribution. This 
distribution has density 


f(x) = p9\e~^ x 1 {JC<0 , + (1 - p)9 2 e d 2 X l {x>0} (5.7) 

with p e (0, 1), 6 \ and 62 positive parameters. This is a mixture of the densities of 
two exponential random variables weighted by p and 1 — p. When a random vari¬ 
able X has a distribution with density (5.7), then we write X ~ DExp(/>. B \, 6 b). 
The Kou process has then Ito-Levy decomposition 


N, 

z, = /it + <t vvj + y ' Jk 

k= 1 


with Jk ~ DExp( p. 6 1 , 62 ). The characteristic function of Z\ is given by 


<Pz x (t) = EE 


itZ\ _ 


a 2 t 2 


exp \ itpL — 


6 \ — it 02 + it 


an the Levy triplet of Z, is (ji. a 2 , X x /), with /(■) from (5.7). 


5.4.1.9 Geometric Brownian motion as exponential Levy process 

Clearly, the geometric Brownian motion is a particular case of the exponential 
Levy model, with Z\ ~ N( 71 , a 2 ), Ito-Levy decomposition Z, — fit + a W, and 
Levy triplet (/r, n 2 , 0). 

The GH, NIG and Meixner process do not contain the Gaussian part (indeed 
the second term of the Levy triplet is always zero) and the jumps essentially 
drive their excursions. The Merton and Kou process contain both the Gaussian 
part and compound Poisson-type jumps. Finally, the Black and Scholes has no 
jump. From this perspective it is clear that such a way of modelling financial 
asset is particularly rich. 


5.5 Telegraph and geometric telegraph process 

Let X t be a telegraph process, i.e. 

X t =x 0 + [ 14ds, V, = Vb(-1)*', t > 0. 

J 0 

where N, is a Poisson process and Vq is a discrete random variable independent of 
N, and taking values +c and —c with equal probability. The so-called geometric 
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telegraph process was proposed in finance by Di Crescenzo and Pellerey (2002) 
and has the following form: 


S t — So exp{oT + crX t } 

with a — fi — \cr 2 , /ieM and a > 0 in analogy to the geometric Brownian 
motion case, and So a constant. Due to the fact that the telegraph process has 
finite variation, the same occurs for its geometric version so it is not good 
because pricing under this model admits arbitrage opportunities. But these 
(standard and geometric) telegraph processes are the building block of other 
stochastic models for finance not affected by this problem. Thus inference for 
these models is an important task to solve. Estimation of telegraph process from 
discrete observations was considered in De Gregorio and Iacus (2008) using 
approximate likelihood method and in Iacus and Yoshida (2008) via the method 
of the moments. We consider first the pseudo-likelihood approach. By taking 
into account the transition density (3.26), the quasi-likelihood function for the 
discretely observed telegraph process is given in the following form: 


L„(k) = L n (X\X 0 , X u ..., X n ) = [] p(X t , A n ; Xt-uti-i) 


(5.8) 


;=t 


" r e -AA, 


n 

i=1 


,, i A rr .—\ i c ^^nl\ ( c V“«,i) 

- , XIq I —\/ u n,i I H-;-- 

2c V c / 7 


X{Un,i>0} 


g—XAn 

H--— S(u„j — 0) 


where u„,i — u n {Xt, Y,_i) = c 2 A“ — (Y; — Y,_i) 2 . The density p(Xj, A„; 
Xi_i,fi_i) appearing in (5.8) is to be interpreted as the probability law of a 
telegraph process initially located in X,_|, that reaches the position X t at time 
tj. The construction of L„ (a) is based on the following assumption: the observed 
increments Y, — X ,. i are n copies of the process X (A„ ) (i.e. the process X(t) 
up to time A„) and treated as if they were independent. This is of course untrue, 
but the estimators based on L„ (a) possess reasonable properties. Notice that 
at this stage, the only parameter of interest is the intensity of the underlying 
homogeneous Poisson process X. Given the following contrast function 

F(k; Xu...,X n )= log L n (k) (5.9) 

OA 

we consider the estimator X n solution to F ( X ) = 0, i.e. 

X n : F(X = X n -,X u ...,X n ) = 0. (5.10) 


It is is possible to prove uniqueness of this estimator as solution of (5.10) and, 
under the additional conditions «A„ — T, A„ -» 0 as n -» oc and T fixed, the 
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estimator X n converges to N t /T which is the optimal estimator of X for the 
continuous time observations. For distributional theory and consistency we need 
to let T -> oo as in the continuous case. This is exploited using the method of the 
moments. Let us consider the increments rjj of the telegraph process defined as 

rji = Xj - Xi -1 = Vo [ ' (-l) N(s) ds = [ ' (-1 ) N(s) ~ N(, ‘- l) ds. 

J*i~ i A-i 

These increments are stationary but not independent. Conversely, the squared 
increments 



(or the absolute increments |)?,|) are independent. This allow to introduce an 
estimator of the method of the moments for X. Indeed, the statistics 


is an unbiased estimator of 


V„ = 


yUL 
A 2 
(=t « 


gnW = 




We define the moment type estimator X„ as the unique solution to the equation 


1 

n 



X 



1 — p—2XA n \ 

- -^- = 0 . 

2X ) 


(5.11) 


This estimator is given in implicit form, but using Taylor expansion we can 
derive and approximate but explicit moment type estimator. Taking into account 
the moment formula (3.30) we obtain: 


E{X i -X i _ 1 } z = -(A n 


n — 2 A. Aji 


2X 


c 

~X 


^n 


2XA n - i(-2AA„) 2 ) - i(-2AA„) 3 ) + o(A 3 )' 


2X 


= c 2 A 2 --c 2 AA 3 + 0 (A 3 ). 

Therefore, an approximate moment type estimator is the following: 


*: = 


3 1 


3 1 


2 nc 2 A 3 . 
n 1 = 1 


= El' 


it 


i=t 


c 2 A 2 


(5.12) 
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and X* is a weighted sum of the independent random variables rjj. Let Xq denote 
the true value of the parameter X, then 

Theorem 5.5.1 Let X n any estimator among X n and X* and suppose that n A„ -» 
oo, A„ -> 0 as n —> oo. Then, X n is a consistent estimator of Xq. Moreover, 


VAA ~ A. 0 ) 4 N ( 0, 


as n —> oo, where —>• denotes the convergence in distribution. The estimator 
X* has the same asymptotic distribution of X n under the additional condition 
nAl 0. 

Thus moment type estimator for the discretely observed telegraph process is 
in general not asymptotically efficient because the asymptotic variance of these 
estimator is |/.o while the optimal variance in this setting is Xq. We finally show 
how to obtain an asymptotically efficient estimator and we also show the proof 
of the result because it makes use of the material in Chapter 2 and the properties of 
the Poisson process. Let us use the notation Eo and Varo to indicate respectively 
the expected value and the variance operator under If, the law corresponding to 
true value of the parameter X — Xq. Consider the following statistic 

1 A i A 

= —7“ 2^ 1 II'7il<cA„} = -7- 2^ '.•))>!}• ( 5 ‘ 13 ) 


The statistic X n is not a good estimator of X for fixed A„. Indeed, 

1 " i-AoA» 

— —— rioUlteKcA,}} — ~ ■ 

'I j 

Therefore, we consider the following estimator 

K = log (1 - A A) (5.14) 

A ,2 

and Theorem 5.5.2 proves that it is the efficient estimator in this context. 

Theorem 5.5.2 (Iacus and Yoshida 2008) Let A„ -» 0, n A„ -> oo as n —> oo. 

Then the estimator X n in (5.14) is consistent, asymptotically normal and attains 
the minimal variance, i.e. it is asymptotically efficient: 
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Proof. In order to prove consistency and asymptotic normality of k n we first 
prove the same properties for X„ . Let us consider the following quantity 

U n — V « A„ (X„ Eo{X„}) 

1 " 

= / . E {l{|i7i|<cA„} — Eo {l{|i),|<cA„}}} 

= 7^; £ { 1{M<cAn> " ^ 

n 

i=l 


with 


& = 


V« A n 


1 


{|i)i|<cA„) 


(1 




)}■ 


We have that Eo{£,-} = 0 thus Eo{I/„} = 0. Moreover, 


«A„Varo{^,} — Varojlji^-^cA,,)) — Eo{1{|ij,|<ca„)H1 Eo{1{|ij,|<ca„}1) 

= (1 - e~ x ° An )e~ x ° An = k 0 A n + o(A 0 A„) 


Var 0 {t/„} = —— n(k 0 A„ + o(k 0 A„)) = k 0 + o(l). 
n A n 

Finally, the £,-’s are independent because they only involve the absolute value 
of the increments rji, Since \f < \/*jn A„, then the Lindeberg’s condition in 
Theorem 2.4.17 trivially holds true: 

n 

E Eo { 1 (IM>G?< 2 } ->■ 0 

i=i 

therefore U n —> /V(0. /-o). Now we need to prove asymptotic normality of k n 
in (5.14). Let f n {u) = log(l - «A„), then f' n {u) = £/«(«) = and 

K = /n(X„). Further, 


T-o = ft (X 0 ) where X 0 = 


| 


An 


E 0 {X„}. 













ESTIMATION OF STOCHASTIC MODELS FOR FINANCE 


215 


By using the 5-method (2.29), we obtain 

y/nA„ (A H A.o) — \/ n A n ( fn (A H ) f n (Ao)) 

— \/wA n (A fl Xo)/„a o ) “h vp ( \jn A n | A h Aq|) 


U n 


1 


1 — AqA,, 


+ o p (l) 


hence A„ (A„ — A 0 ) -*• N(0, A 0 ). 

Consider now the geometric telegraph process 

5) = So expfo't + crZ,} 


and assume we observe the log-returns 

F, = log -P- = «A„ + or(X, - Xi-i) 

Si-i 

where S, = S(t,) are discrete observations from the geometric telegraph process. 
We assume /x to be known, which is usually the case in finance where /x is 
related to the expected return of nonrisky assets like bonds, etc. The parameters 
a and A are to be estimated. We assume F, to be n copies of the process 

F(A„)=aA„ + aX(A„) 


with X(A n ) — Xj — X,-\ and X (0) = xq = 0. Therefore, by (3.29), we have 

E{F (A„)} = aA„ 


and by (3.30) we obtain 


Var{F(A,,)} = CT “VarX(A„) - ( A„ 


1 — e 


-2XA„ 


2A 


(5.15) 


A good estimator of the volatility a can be derived from the sample mean of the 
log returns. Indeed, 


1 « 

Y„ — — y'' Yj — a A„ H- X„ 

n • * n 


(5.16) 


i=i 


and 


E Y n — aA n H EA„ — a A n — ( /x- a 2 1 A„ 

n \ 2 


(5.17) 
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again by (3.29) and for the properties of the log-returns. From (5.17) we have 
that 

2 n (.. E iV 


er“ = 2 \ fi — 


A/x 


from which the following unbiased estimator of a 2 can be derived 


d " =2 (' ,_ T;)- 

Therefore, a reasonable moment type estimator of a is 


<r„ = 2 Lu 


Yn_ 
A n 


(5.18) 


which not always exists because there is no guarantee that // > Y n / A„. We then 
use a n to estimate A making use of (5.15). Let 


sj = - J2 (Y ‘ ~ 


1=1 


then the proposed estimator of A is 


K = arg min Sy - — A„ 

A>0 \ A 


1 — e 


—2 A A.. 


2A 


(5.19) 


5.5.1 Filtering of the geometric telegraph process 

If the velocity c is not known one can proceed as follows: set 

Yj - E Y n 

Zi = - -- = Xi - X,-_! = X(A„), 

a 

an estimator of the increments of the telegraph process is 


Yj - Y, 


A(A„), i — 1- ,n. 


Then 

Z\ — X \, Z2 + Z1—X2, Zt, + Zi + Z\ — Xt,, . ■. 


where Xj are the estimated states of the underlying telegrapher’s process. From 
these estimates, one can proceed as in previous sections and estimate both A and 
c. From both estimated and true increments of the underlying telegraph process 
it is possible to obtain the following consistent estimator of c 


c 


n 


" h A " 
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This is a consistent estimator of c, because 



then c n —> c as A„ -» 0. With this estimator one can subsequently use it to 
estimate X, a and a by replacing c„ in place of c in all expressions. 


5.6 Solution to exercises 


Solution 5.1 (to Exercise 5.1) We can write the log-likelihood as 

1 


-2 Y, I 0 ® X > + Y. ^ { Pe IR ( A ’ Y 


Xi -1 


f=2 i=2 

and hence the corresponding negative log-likelihood can be coded as follows: 


R> library(sde) 

R> AhnGao.loglik <- function(thetal, theta2, theta3){ 

+ n <- length(X) 

+ dt <- deltat(X) 

+ -sum(dcCIR(x - 1/X[-1], Dt = dt, x0 = l/X[-n], 

theta = c( thetal, 

+ theta2, theta3), log = TRUE)) + 2 * sum(log(X[-1])) 

+ } 


To test it we use the U.S. interest rates again 

R> library(Ecdat) 

R> data(Irates) 

R> X <- Irates [, "rl"] 

R> fit <- mle(AhnGao.loglik, start = list (thetal = 0.1, 
theta2 = 0.1, 

+ theta3 = 0.3), method = "L-BFGS-B", lower = rep(0.001, 3), 

+ upper - rep (1, 3)) 

and the final estimates are as follows: 

R> coef(fit) 

thetal theta2 theta3 
0.3200484 0.9691536 0.5215643 


5.7 Bibliographical notes 

Inference for stochastic processes is a wide field still in development. Most con¬ 
tinuous time processes in finance are diffusion processes. To deeply understand 
the theory of estimation starting from continuous time, one should refer at least to 
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Kutoyants (2004). Books which consider discrete time observations are Prakasa 
Rao (1999) and Iacus (2008). For Poisson processes one good starting reference 
is Kutoyants (1998). The literature on estimation for Levy processes is still very 
sparse, but there are several examples of estimation techniques for particular 
models in, e.g., Schoutens (2003), Cont and Tankov (2004) and Raible (2000). 
The above list is not exhaustive but a good starting point for further reading. 


References 

Abramowitz, M. and Stegun, I. (1964). Handbook of Mathematical Functions. Dover 
Publications, New York. 

Ai't-Sahalia, Y. (1996a). Nonparametric pricing of interest rate derivative securities. 
Econometrica 64, 527-560. 

Ai't-Sahalia, Y. (1996b). Testing continuous-time models of the spot interest rate. Rev. 
Financial Stud 9, 2, 385-426. 

Bamdorff-Nielsen, O. and Shepard, N. (2001). Normal modified stable processes. Theory 
of Probability and Mathematical Statistics 65, 1-19. 

Barndorff-Nielsen, O. E. (1997). Normal inverse gaussian distributions and stochastic 
volatility modelling. Scand. J. Statist. 24, 1-13. 

Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities. Journal 
of Political Economy 81, 637-654. 

Brennan, M. and Schwartz, E. (1980). Analyzing convertible securities. J. Financial 
Quant. Anal. 15, 4, 907-929. 

Carr, P., Geman, H., Madan, D., and Yor, M. (2002). The fine structure of asset returns: 
an empirical investigation. J. Business 75, 305-332. 

Chan, K., Karolyi, G., Longstaff, F., and Sanders, A. (1992). An empirical investigation 
of alternative models of the short-term interest rate. J. Finance 47, 1209-1227. 

Cont, R. and Tankov, P. (2004). Financial Modelling With Jump Processes. Chapman & 
Hall/CRC, Boca Raton. 

Cox, J., Ingersoll, J., and Ross, S. (1980). An analysis of variable rate loan contracts. 
J. Finance 35, 2, 389-403. 

Cox, J., Ingersoll, J., and Ross, S. (1985). A theory of the term structure of interest rates. 
Econometrica 53, 385-408. 

De Gregorio, A. and Iacus, S. M. (2008). Parametric estimation for the standard and 
the geometric telegraph process observed at discrete times. Statistical Inference for 
Stochastic Processes 11, 249-263. 

Di Crescenzo, A. and Pellerey, F. (2002). On prices’ evolutions based on geometric tele¬ 
grapher’s process. Applied Stochastic Models in Bussiness and Industry 18, 171-184. 

Dothan, U. (1978). On the term structure of interest rates. J. Financial Econ. 6, 59-69. 

Genon-Catalot, V. and Jacod, J. (1993). On the estimation of the diffusion coefficient for 
multidimensional diffusion processes. Ann. Inst. Henri Poincare 29, 119-151. 

Iacus, S. (2008). Simulation and Inference for Stochastic Differential Equations. With R 
Examples. Springer Series in Statistics, Springer, New York. 

Iacus, S. and Yoshida, N. (2008). Estimation for the discretely observed telegraph process. 
Theory of Probability and Mathematical Statistics 78, 33-43. 


ESTIMATION OF STOCHASTIC MODELS FOR FINANCE 


219 


Jacod, J. (1997). On continuous conditional gaussian martingales and stable convergence 
in law. Seminaire de probabilites (Strasbourg) 31, 232-246. 

lacod, J. (2002). On processes with conditional independent increments and stable con¬ 
vergence in law. Seminaire de probabilites (Strasbourg) 36, 383-401. 

Kou, S. G. (2002). A jump diffusion model for option pricing. Manag. Sci. 48, 
1086-1101. 

Kutoyants, Y. (1998). Statistical inference for spatial Poisson processes. Lecture Notes 
in Statistics, Springer-Verlag, New York. 

Kutoyants, Y. (2004). Statistical Inference for Ergodic Diffusion Processes. Springer- 
Verlag, London. 

Madan, D., Carr, P., and Change, E. (1998). The variance gamma process and option 
pricing. European Finance Review 2, 79-105. 

Madan, D. and Seneta, E. (1990). The variance gamma (v.g.) model for share market 
returns. Journal of Business 64, 4, 511-524. 

Merton, R. C. (1973). Theory of rational option pricing. Bell Journal of Economics and 
Management Science 4. 1, 141-183. 

Merton, R. C. (1976). Option pricing with discontinuous returns. Bell J. Financ. Econ. 3, 
145-166. 

Phillips, P. and Yu, J. (2009). Maximum likelihood and gaussian estimation of continuous 
time models in finance. Handbook of Financial Time Series, Springer 497-530. 

Prakasa Rao, B. (1999). Statistical Inference for Diffusion Type Processes. Oxford Uni¬ 
versity Press, New York. 

Raible, S. (2000). Levy Processes in Finance: Theory, Numerics, and Empirical Facts. 
PhD Thesis, University of Freiburg, http://www.freidok.uni-freiburg.de/volltexte/51/, 
Freiburg. 

Schoutens, W. (2003). Levy Processes in Finance. lohn Wiley & Sons, Ltd, Chichester. 

Schoutens, W. and Teugels, I. (1998). Levy processes, polynomials and martingales. 
Commun. Statist.- Stochastic Models 14, 1, 335-349. 

Sprensen, M. (2009). Parametric inference for discretely sampled stochastic differential 
equations. Handbook of Financial Time Series, Springer 531-553. 

Tweedie, M. (1984). An index which distinguishes between some important exponen¬ 
tial families. In Statistics: Applications and New Directions: Proc. Indian Statistical 
Institute Golden Jubilee International Conference ( ed. J. Gosh and J. Roy) 579-604. 

Vasicek, O. (1977). An equilibrium characterization of the term structure. J. Financial 
Econ. 5, 177-188. 

Yoshida, N. (1992). Estimation for diffusion processes from discrete observation. J. Mul- 
tivar. Anal. 41, 2, 220-242. 


6 


European option pricing 


6.1 Contingent claims 

In this chapter we will focus on the determination of the fair price of a derivative, 
like options. We will assume geometric Brownian motion for asset price dynamics 
and pricing formulas of the Black and Scholes (1973) and Merton (1973) model. 

Although we mainly discuss European call options we will discuss pricing of 
generic contingent claims which include call and put but also Asian options or 
options with barriers. Options of American type require a special attention and 
hence will be treated separately in Chapter 7. 

In Chapter 8 we will consider deviations from the theory presented here. The 
main ingredient to be replaced in the Black and Scholes theory will be the model 
which describes assets prices to admit jumps or non-Gaussian returns. 

We assume that there is a probability space (£2,A,P), and a filtration 
T = \T t A > 0}. When not explicitly mentioned, all processes are adapted to 
the filtration T. 

Definition 6.1.1 A T-contingent claim or T -claim is a contract which pays to the 
holder a stochastic amount X at time T. The random variable X is Tj -measurable 
and T is called exercise time of the contingent claim. 

The definition of T -claim explicitly requires Tj measurability of X because 
of the fundamental principle of finance in which only the information up to 
some given T (and not future information) determines the price. In this setup, 
the information is the one generated by Brownian motion paths up to time T. 

All contingent claims have an associated payoff function /(■) usually calcu¬ 
lated at point St (the final value of the asset price at time T). If X — f 
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then X depends on St which in turn depends on Bj. Therefore, X is clearly Tj 
adapted. 

European call options have a payoff function of the form /(x) — max(x — 
K, 0), while for put options we have f'(x) = max (A" — x, 0). Here K is the 
strike price. Indeed, for a call option the payoff is not null only if Sj is higher 
than the strike price K, i.e. Sj — K > 0. Hence, its payoff can be written as 
f(Sr ) = max(,Sy — K, 0) and similarly for the put option. The next code shows 
how to plot the payoff functions for the call and put options represented in 
Figure 6.1. 

R> f.call <- function(x) sapply(x, function(x) max(c(x - K, 0))) 

R> f.put <- function(x) sapplyfx, function(x) max(c(K - x, 0))) 

R> K <- 1 

R> curve(f.call, 0, 2, main - "Payoff functions", col - "blue", 

+ lty = 1, lwd = 1, ylab = expression(f(x))) 

R> curve(f.put, 0, 2, col = "black", add = TRUE, lty = 2, lwd = 2) 
R> legend(0.9, 0.8, c ("call", "put”), lty = c(l, 2), 

col = cC'blue", 

+ "black"), lwd = c(2, 2)) 


Payoff functions 



Figure 6.1 The payoff function of call and put options. 


A barrier option is characterized by the fact that some threshold f is given 
and and the contingent claim pays (like a put or a call) only if, in [0, T], the 
underlying asset never passes the threshold j3. Imagine a call option with barrier 
() and strike price K. The payoff will be of the form: 


X = l[s t <p,t<T} max(5r — K, 0) 


and the payoff X depends on the whole trajectory S t , t < T, and hence by B t , 
t < T. So it is still Tj -measurable. 

An Asian option is another contingent claim which pays a payoff that is a 
function of the average of the values of the underlying asset calculated on the 
whole time interval [0, T\ An example of Asian option is a claim which pays 
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only if the average price is higher than the strike price 


X = max 




which satisfies ail the hypotheses. 

One powerful technique to Unci fair price of options is the equivalent 
martingale measure approach. We will discuss in detail this technique later. 
Other opportunity rely on the rare case in which Ito formula can be used to 
solve f(Sr) explicitly. We will see that this not always trivial because payoff 
functions are usually not differentiable. 

Before discussing the continuous time Black and Scholes model, we start 
with the analysis of market with one single period. This simplified exposition 
allows us to explain other basic concepts such as: risk neutral measure, hedging 
and arbitrage. 

6.1.1 The main ingredients of option pricing 

Before going into detail, it is useful to get the general idea of what is reasonable 
to ask of an option price theory. We do not discuss technical assumptions in 
detail, but just the general idea for the moment. 

One of the natural assumptions to ask is that the market under which the 
prices are fixed do not admit arbitrage. Arbitrage free markets are markets such 
that it is not possible to start without capital and gain a positive amount of money 
with certainty. On real markets and in the short run, this assumption may not 
always be verified due to asymmetry of information among the actors, but in 
the (not so) long run the market actors tend to behave in a way that eliminates 
arbitrage opportunity. If there are no arbitrage opportunities, then the prices on 
the market are fair. We are particularly interested here in the price of derivatives. 
Denote by po the (fair) price of a derivative like a contingent T -claim at time 0. 

As mentioned in Chapter 1, the writer who sells the contract should guarantee 
himself from potential loss, so he has to set up a hedging portfolio at time 0 such 
that at time T the risk associated with the right of exercise of the option right 
is covered by the value of the portfolio. We denote by H, the value of the 
hedging portfolio at time t e [0, T\. When this hedging portfolio exists for all 
kind of derivatives exchanged on the market, we say that the market is also 
complete. Completeness is rarely observed in real markets for several reasons 
which we will discuss later (for example, transaction costs have an impact on 
the management of the hedging portfolio) but it is not unreasonable to ask that 
the writer protect himself and, at the same time, the holder of the derivative is 
guaranteed to keep the right to exercise the option at maturity. So hedging means 
that H t — X, where X is the payoff of the contingent claim. This is clearly an 
equality between random variables. 

Non-arbitrage means, we cannot obtain a positive gain with probability one if 
we start without capital and that the price of derivative is fair. This is written as 
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P 0 = Hq. Clearly, the hedging portfolio should contain a part of risk and a part 
of risk free elements in it but it cannot also contain the derivative itself. So we 
can write the value of the portfolio at time t as H t = a S t + bR t , where a is the 
quantity of assets and b a quantity of risk free activities, e.g. bonds, R,. Notice 
that, in general, a and b may also be function of t but this implies a continuous 
adjusting of the composition of the portfolio. This continuous adjustment is in 
practice prevented by transaction costs, and this motivates the above comment 
about absence of completeness of real markets. 

Finally, because the derivatives can be exchanged at any time up to maturity 
on the market (even if the holder can exercise it at maturity) and because we 
assume arbitrage-free markets, we also have to ask P, = H t , with P, the price 
of the derivative at any time t e [0, T J. So we would like to solve this problem 
by considering the sequence of equalities: 


Pq — Ho — a So + bR Q P, — H, — aS t + bR, no arbitrage 

Ht — X completeness 

Now notice that Pq and hence Hq are nonstochastic, while both X and Ht are, 
but we need to fix Pq (and Hq). The idea is then to consider the expected payoff 
EX = EH t and then discount the value of E Hr as if X were constant in order 
to obtain Hq and then fix both Pq and the composition of the hedging portfolio. 
Loosely speaking, we would like to write Hq = (1 + r)~'EX. This is not possible 
in this simple way, but to explain how it may become possible we start with the 
simple one period market. 

6.1.2 One period market 

Imagine a market where only a single risky asset and a risk free bond are 
exchanged. We assume that the asset at time 0 has an initial value of sq, i.e. 
So = sq, and that the bond has a value of 1, i.e. bo = 1. We further assume that, 
at time T, only two possible events may occur G = {oj\ , a >2 [ with probability 
P(cl> i) = p and P(co 2 ) = 1 — p, p e (0, 1). Therefore, either S(T, a>\) — si or 
S(T, 0 ) 2 ) = si. For simplicity we assume .v 1 > .vo > s 2 - Given the assumptions, we 
have P(St — H) = l> and P(Sr — s 2 ) — 1 — p. We are in a one-period market 
and we assume that the interest rate is r, then at time T, the value of the bond 
is bT — 1 + r. Let us now constmct a portfolio H buying a assets and b bonds. 
At time 0, our investment has a cost of H (0) given by 

Hq = a ■ so + b ■ 1, 


while at time T, either 


H(T, wi) = a ■ si + b ■ (1 + r) 


with probability p or 


H(T, co 2 ) — a ■ s 2 + b ■ (1 + r) 
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with probability 1 — p. So the expected value of our investment at time T is 
given by 

E p H t = pH(T , coi) + (1 - p)H(T, co 2 ) 

where we explicitly use the notation E P to denote that the expected value is 
calculated under the measure P. 

Now the question is the following: can we change the probability measure 
from P { Plop) — p, P(co 2 ) = 1 — p] to another measure Q {Q(o >\) = 
q , Q(co 2 ) — 1 — qj, q e (0, 1), such that we can write 

H 0 = (l + r)- l E Q H T ? 


Which means, can we find a constant q, such that 

Ho = (1 + r)~\qH(T, op) + (1 - q)H(T, co 2 )) ? 

If this new probability measure Q exists, then we can discount the final value 
of the portfolio Hj and obtain the initial amount Ho of investment needed at 
time zero in order to obtain that exact expected value H T at the end, i.e. we 
are completely eliminating the risk of our investment and treat the risk portfolio 
as if it were a bond. If this Q exists, then we call the new measure Q the risk 
neutral measure. Let us try to verify if such a constant q exists. We need to 
find q such that 

a ■ s 0 + b — (1 + r)~ l (q(a ■ sq + b ■ (1 + r)) + (1 — q)(a ■ s 2 + b ■ (1 + r))) 

which we can manipulate as follows: 

{a ■ so + b){ 1 +r) — q(a ■ sq + b ■ (1 + r)) + a ■ s 2 + b ■ (1 + r) 

- q(a ■ s 2 + b ■ (1 + r)) 

a ■ s 0 (l + r) + b{\ + r) - a ■ s 2 - b ■ (1 + r) = q(a ■ si + b ■ (1 + r) 

-a ■ s 2 -b ■ (1+ r)) 
a ■ sq(1 + r) — a ■ s 2 — q(a ■ si — a ■ s 2 ) 


and finally obtain 

so(l + r)-s 2 

<? =- 

51 - 5 2 


We now have to check that q e (0, 1) in order to obtain the new probability 
measure Q {Q(a>i) = q , Q(co 2 ) — 1 — q], but this is indeed true because 
5o(l + r) >52 by construction. Notice that we expect p ^ q, so that, while p is 
the probability that the asset increases its value from 5o to 5i, for q there is no 
such direct interpretation. In this sense, P is called the true/historical/physical 
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measure while Q is an instrumental measure that we only use to perform 
calculations. So, using non-arbitrage and completeness, we can now write 


P 0 = H 0 = ( 1 + r)-'E q H t = (1 + r)~ l E Q X. 


From the above, one can now obtain the value of H(0) which is necessary to 
put in the hedging portfolio. There is still the problem on how to allocate a 
and b inside the portfolio. In particular, there may be many or one solution. 
Let us determine the optimal allocation of the hedging portfolio. Denote by 
x\ — X ( a) |) and X 2 — X (ojo). If we want to have H r = X, we have to solve 
the following system 

{as\ + b( 1 + r) = x\ 
as2 + b{ 1 + r) = X2 


from which 

XI - x 2 

a = - 

si - s 2 


b — 


X2S1 - X\S2 
Si - 52 


(1 + r)" 1 . 


We now look at other properties of the new measure Q. Let us set a — l 
and b — 0, i.e. we construct a portfolio of only assets. Then, from the above 
equalities, the relation 


H{ 0) = {\ + r)~ l E Q H T 


reduces to 

so — (1+0 'EgSY. 


This means that investing in risky assets has the same return of a bond if we 
perform our calculations under the measure Q instead of using the physical 
measure P. Again, Q removes, in a sense, the risk from the evaluation of 
market activities. 

So, we have seen that if a measure Q exists, it allows along with completeness 
and non-arbitrage, to have fair prices, hedging portfolios (and their composition) 
and, moreover, the randomness of the underlying process is neutralized (the asset 
price process behaves as a bond). We will see that, in general, the measure Q 
maintains these properties in the general setup with some differences like, for 
example, that all risky activities behave like martingales (not simply as bonds). 

To conclude this discussion, we need a more in depth discussion of why 
the requirement of a fair price is equivalent to ask that Pq — Ho. We have to 
prove that this requirement is unavoidable. Let us assume that on the market 
the contingent claim is offered at a price Pq which is lower than the fair price 
P 0 — Hq and let us prove that we can make arbitrage, i.e. make profit without 
risk. Of course, we have to assume that we know Po. 

We can short-sell a hedging portfolio Hq at a price of NHq. Given that 
Po < Pq we can buy N of the above claims at the cost of N Pq and we still have 
a surplus value of N(Hq — Po). Let us buy bonds for that amount. Notice that we 
don't need to really own the claims, we just need to sell a portfolio in order to 
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obtain a value of Hq to start our speculation, without putting one of our pennies. 
At maturity T our portfolio will have a value of NHj and because it is a hedging 
portfolio NHj = NX where NX is the true value of the contingent claims. So 
we have enough money for the contingent claims. But also our bonds are at 
maturity and we get a strictly positive profit of N(Hq — Pq){ I + r). Clearly, this 
risk-less profit is null if Pq — Hq i.e. if Pq — Pq. A similar argument holds for 
the symmetric case Pq > Pq. 

So, the one-period market is a very simple set up but gives an overview of 
the general approach to the option pricing theorem. 

Notice that in reality markets are incomplete because not all claims are com¬ 
pletely replicable. If it were the case, instead of buying a derivative one could 
just buy the equivalent hedging portfolio. In some cases, especially if the under¬ 
lying asset follows a Levy model, there will be more than one way to replicate a 
claim (more than one measure Q) and we have to choose one among the others. 
Conversely, notice that the price po and the hedging portfolio are obtained via 
the risk-neutral measure, so both are independent of the physical measure P. But 
to obtain a risk neutral measure we need to assume non-arbitrage and hedging. 
So we cannot construct the portfolio without passing through Q. 

6.1.3 The Black and Scholes market 

We now consider the continuous time case. We assume to have a financial 
market with only one asset (risky investment), a bond and a contingent claim. 
The dynamics of the underlying asset is represented by the geometric Brownian 
motion 

d S, = iiS,dt + S,dB, 

while interest rates follow the following nonstochastic differential equation 

d R, — rR,dt. 

For simplicity we assume Rq — 1 . The constant rate r is, for example, the one 
year interest rate on the bonds. The solution to the differential equation for R, is 
R t — exp(rr), which can be easy obtained as follows: 


d R t 

d R, A t 

d R, = rR,dt <*■ —- = rR, = r 

dr R, 


taking the limit as dr —> 0 


Rf ^ 

= r — log R, = r <£> log R, — log 

R t dr 


Rq — / rdu — 

Jo 


Rt r t 

rt <£> — — e 

Ro 
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the result follows from the fact that Rq — 1. The discount rate r a can be obtained 
as the solution of the equation (1 + r a )‘ = exp(rf) in the following manner: 

(1 + r a y = exp (rt) <£> log(l + r a )‘ = rt O t log(l + r a ) = rt 1 

+ r a = e r r a — e r - 1 . 

Let P{t) = P, the price of the contingent claim at time t. We need to derive 
the dynamics of P, to complete the description of how the market evolves in 
continuous time. We assume that all stochastic processes are now adapted to the 
filtration generated by the Brownian motion without explicitly mentioning it. 

6.1.4 Portfolio strategies 

In the Black and Scholes market a portfolio can be composed of assets, bonds 
and claims. We denote by a(t) the amount of assets in the portfolio at time t ; by 
b(t) the amount of bonds and by c(t ) the amount of claims. 

Definition 6.1.2 A portfolio strategy is a triplet (a, b, c ) of stochastic processes 
and the value of the corresponding portfolio at time t is given by 


Vt — a t S t + b t R t + c t P t . 


The processes (a, b, c ) are supposed to be adapted, though in the simplest case, 
as we will see, they are supposed to be constant. In both cases, the process 
V t is also an adapted stochastic process. This means that an investor owning 
such a portfolio does not have more information than what the market reveals 
up to time t. In the Black and Scholes market, portfolios have an additional 
constraint. 

Definition 6.1.3 A portfolio is said to be a self-financing portfolio if and only if 
its value at time t depends only on the variation of the processes and the triplet 
(a, b, c), i.e. it is not possible to add value or remove value from the portfolio 
but it is only possible to adjust the strategy ( a , b, c). Therefore, a self-financing 
portfolio verifies 

dV, = a t dS, + b,dR t + c,dP,. 


Remember that the writing 


dV ; — a^dSf + bfdRf CfdPf 

is interpreted as 

V t — V o + f a u dS u + 

Jo 

In order to complete the description we have to discuss the dynamics of P, to 
make sense of the last stochastic integral. To this end we will make use of the 
Ito formula. 


f b u dR u + f 
Jo Jo 


CudPu - 
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6.1.5 Arbitrage and completeness 

We now need to clarify the meaning of non-arbitrage and completeness of the 
market in the continuous time framework. Remember that arbitrage means the 
possibility of starting with a negative investment and ending with a positive one, 
almost surely. Technically, this notion is defined as follows: 

Definition 6.1.4 A self-financing strategy is called an arbitrage opportunity if it 
satisfies 

Vq <0, V T > 0, EV t > 0. 

Completeness requires the existence of a replicating or hedging portfolio com¬ 
posed only by assets and bonds which are able to cover the risk of the payoff 
associated with the contingent claim. We denote the hedging portfolio by H, 
and the corresponding strategy by (a 11 . b" ). At time t , the value of the hedging 
portfolio is 

H, = a”S, + b 1 / R,. 


Definition 6.1.5 A portfolio strategy (a 11 , b H ) 

H, = a” S, + b?R t 


is called hedging portfolio for a contingent claim c if 


H T — X 


where X is the payoff at exercise time of the contingent claim c. 

We say that the market is complete if for all contingent claims there exists a 
hedging portfolio. 

Under some circumstances, it is possible to prove that, given the above con¬ 
ditions, there can still exists portfolio strategies such that starting with an initial 
value of zero, produce positive investments. These are called doubling strategies 
and in order to exclude them from the Black and Scholes market, it is necessary 
to assume the following assumption: there exists K > 0 such that V, > K for all 
t e [0, T], 

6.1.6 Derivation of the Black and Scholes equation 

We remember that we have denoted the payoff of the contingent claim as 


X = f(S T ). 


We are interested in the price of the contingent claim at time t — 0 and we 
know that the payoff is the value of the contingent claim at time t = T. Further, 
the market is free to sell and buy derivatives even before maturity, so there is 
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the need to consider the price of the claim for all time t e [0, T\. We denote 
by C(t, S t ) — P, the price of the contingent claim at time t, which of course 
depends on the payoff function / and the underlying asset S. Notice that, at time 
T, Pt — C(T, St) = f (St ) = X. If we want to derive a stochastic differential 
equation for P, we need to apply the Ito formula to f(S(T) so we need to require 
that E/(5Y) 2 < oo. 

Now our task will be to derive an equation for P, = C(t, S, ) for t < T which 
will provide the price of the option Pq — C(0, So). We denote by C(t. x) the 
generic function which represents the price of the contingent claim. 

Theorem 6.1.6 Let C(t , x) be the function that describes the price P t — C(t, S, ) 
of the contingent claim. Assume that C(t,x) is differentiable in t and two times 
differentiable in x. Then, C(t,x ) solves the following (nonstochastic) partial 
differential equation: 



(6.1) 


Equation (6.1) is called the Black and Scholes equation. 


Proof We start by applying the Ito formula to P, — C(t, S t ) 


d P, = C r (t, S,)dt + C x (t , S,)dS f + \c xx (t, S,)(dS t ) 
= C t (t, S t )dt + C x (t, S t ) (ptS,dt + crS t dB t ) 


2 


+ —C xx (t, S t ) (p-Sfdt + aS t dB t ) 


2 



p.oSj:dtdBt 


and by dropping the terms of order (dr) 2 we obtain 



( 6 - 2 ) 


We now make use of the assumption of completeness and non-arbitrage. 
Completeness of the market means that there exists a hedging portfolio strategy 
( a H ,b H ) such that H T — X and non-arbitrage implies that H, — P, for all 
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t < T . Therefore, we have the following equality 


d P, = d H, 

= afdSt + bfdR t 

(6.3) 

= af 1 {fiS r dt + crS'rdf?,} + r R, dt 
— {fMa^S t + rb^R t } dt + (raf 1 S t dB t . 

Comparing the last line of (6.3) and (6.2) we realize that, necessarily, we have 

of* = C x (t,S t ). (6.4) 


Therefore, from 

C(t, S ,) = P l = H t = a?S t + b?R t 

we also obtain bf 

H = C(t, S t ) — af 1 S t 
1 R, 

Now replacing (6.4) and (6.5) into (6.3) we obtain 


(6.5) 


dP, = /iC x (l, S r )S r dt + rR~' \C(t, S,) - a"S t \ R,dt + aC x {t, S,)S,dB, 

( 6 . 6 ) 

= {fiC x (t, S,)S, - rC x (t, S,)S, + rC(t, 5,)} dt + cxC x (t , S,)S,dB, 
Finally, equating (6.6) and (6.2) we obtain 


fiCAt, S,)S, - rC x (t , S t )S t + rC(t, S t ) - C t (f, S t ) + fxC x (t, S t )S t 
+ l -o 2 C xx {t,S,)Sl 

and then 

rC(t, S t ) = C t (t, St) + rS t C x (t , S t ) + la 2 S 2 C xx (t , S t ). 

Therefore, C(t,x) satisfies (6.1). 

Theorem 6.1.6 saying nothing about the explicit price C(t, x) — P, . To derive 
the expression of the price we need to solve the Black and Scholes equation 

rC(t, x) — C f (t, x) + rxC x (t, x) + -cr 2 x 2 C xx {t, x) 

for all x > 0. Notice that, at time T, C(T, St) — /( St ), which implies that we 
have to find solutions for CAT, x) — fix), i.e. only in terms of the variable x. 
We will discuss this problem in the next section. Before that, we look again at the 
necessity to impose the non-arbitrage conditions. Suppose again that the market 
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sells the claim at a price Pq < Hq. We assume that both the real price P, and 
the wrong price P, are adapted processes. Given that the claim costs less than it 
should, we buy one unit and put it in our portfolio, i.e. we set c t — 1 , for all t e 
[0, T], Let us short-sell a portfolio Hq which is a hedging portfolio for the claim 
and with the surplus Hq — Pq > 0 we buy bonds. Our portfolio strategy is then 

(-a?, -b? + H 0 - P 0 , 1 ). 

This portfolio has an initial value of 0, i.e. Vo = 0, while at time T we get 

Vj — —Hj + (Hq — Pq)Rt + Pt 
= -X + (Hq - P 0 )R t + X 
— (Hq — Pq)R T > 0 

thus a strictly positive value. It is an arbitrage opportunity if we also show that 
the portfolio is self-financing, i.e. if 


dV, = a,dS, + b,dR t + c t dP t . 


In our case we have 


V, = —H, + (Hq - Pq)R, + P, 


or 


dV ; = -AH, + (Hq - P 0 )dR, + dP, 

= —a^dS, + (-b? + Hq- P 0 )dR t + dP, 
so, it is a real self-financing portfolio, and we have realized arbitrage. 

6.2 Solution of the Black and Scholes equation 

Let B, be a standard Brownian motion and define the following new process 

Bf = x + B t 

with x some constant. Then, Bf is still a Brownian motion which starts from 
x at time 0. If we want a Brownian motion which is at x at time t we need to 
introduce the process 


B- — x -p B s — Bf , .s -• > /. 

which is called translated Brownian motion. In a similar manner we can define 
the translated geometric Brownian motion as follows: 

Z' s - X =x + J' rZ^du + J oZ’ u x dB u . 
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which is a geometric Brownian motion which is at x at time t. The process 
{Z[: x , s > t] satisfies the stochastic differential equation 

d Z[’ x = rZ* s ’ x ds + aZ' s x dB s 


with the following explicit solution: 


Z[’ x = x exp 



is - t) + <7 ( B s - B t ) 


(6.7) 


Before proving that C{t,x ) in Theorem 6.1.6 is a solution of the Black and 
Scholes equation (6.1) we show a similar fact for the translated Brownian 
motion. Notice that (6.1) is a standard partial differential equation without any 
random term, so no stochastic calculus is needed to get the result. But let us 
first consider the process B x = x + B,. The random variable B x is normally 
distributed with mean x and variance t. Thus, for any function g (x ), we can 
write Eg(B; v ) as follows: 

/ OO 

g(y)p(t, x - y)dy 

-OO 


where pit, z) — -?=--£ 2f is the density of the N(0, t) random variable. We 

V ZTZt 

want to prove that v(t, x) — Eg(Bj X ) is a solution of a particular partial 
differential equation subject to the final condition v(T,x) = g(x). It is an easy 
exercise to prove that p(t, x) solves the following partial differential equation: 

d 1 9 2 

3i P{ ‘- Z)= 2 3? P( '' Z> 
called the heat equation. Indeed, 

9 1 (z 2 1 \ 

Jt P(t,z) = p<,t,z )- 2 (71-7). 

a z 

— p(t,z ) = - p(t,z )~, 
dx t 

a 2 1 z 2 

—r pit , z) = — Pit, z) + —pit, z). 
a z~ t t z 

Now, let us define uit,x ) = E giB x ). By direct differentiation (we assume it is 
possible to exchange integration and derivation) we also get that 


with initial condition 


a 1 a 2 

— uit,x ) = uit,x) 

at 2 ox z 


u(0, x) = Eg(Z?o) = Eg(x) = g(x). 
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The next step is to reverse time in order to get the translated Brownian motion. 
We define a new function v(t, x) — u(T — t,x ) so that £-v{t,x) — 

Therefore 



with terminal condition v(T , x) — u(0, x) — gix). Now we recall that 

v(t, x) = u{T -t,x) = E g{Bj_ t ) = E g{B'j x ) 

because Bj _ t = x + B T _ t ~ x + B T — B, — B r j x . So we have proved that 
v{t,x)—¥jg(Bj X ) with final condition v(T, x) — g(x) is a solution of the 
partial differential equation 


d 1 d 2 

r’ < '-- t) + 25W’ < '- x) = 0 - 


( 6 . 8 ) 


Theorem 6 . 2.1 The solution C(t, x) of (6.1) is 


C(t,x ) = e“ r(7 '“°E/(Z^). 


(6-9) 


Proof. We start considering that Y — log Z' r A 



with density 


p{t, y; x) = 


— — : - exp 

y/2jto 2 (T — t) 


1 


(y-log x - (r - \jr 2 ) {T - t)f 
2o 2 {T - t) 


and we put in evidence the variable x in the notation pit, z\ x). Now we can 
write C(l, x) in this form: 
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In order to prove that C(T, x) solves (6.1) we first need to evaluate the partial 
derivatives of p(t, y; x). 


dt 


p(t, y\ x) = p(t, y; x) 


1 {r-^){y-\ogx-{r-4)(T-t)) 


2(T-t) cr 2 (T — t) 

2 


(y - log x - (V - (T - o) 

2 cr 2 (T -t) 


9 y-log x-(r-t) 

—p(t,y; x) = p(t, y; x) - j —— -, 

ox xcr A (l — t) 


j~ 2 p (t,y;x) = p(t , y; x) 


(>' - logx - (V - x) ( r ~ o) y - logx- (r-^r) (T - t) 


It is easy to see that 
9 


x 2 o 4 (T -t) 2 


9 


x 2 o 2 (T — t ) 


1 


-pit, y ; x) + rx — p(t, y; x) + -cr 2 x 2 T —^p(t, y ; x) = 0. 


9" 


9t" ' dx r ‘ 2 

Now, direct calculations show the following 

g a r+oo 

9? 


9x 2 


9 /’+°° 

C(t, x) = rC(f, x) + e~ r(T ~ n — / f(e y )p(t , y; x)dy 

^ J — CO 

, Z" 1-00 9 

= rC(t,x) +e~ r{T ~ t) / f (e y )—p(t, y; x)dy 

</ — CO ^ 


= rC(f, x) — e 


9 


/ +oo 

fr„y^ 

-C 


/(e v ) 


1 


rx—p(t, y; x) + -cr 2 x 2 7 —rp(t, y; x) 


9- 


9x 


9x 2 


dy 


9 1 , ,9 2 

= rC(f, x) — rx — C{t, x)-a x"— yC(t, x). 

9x 2 dx- 

So we have proved that C(f, x) solves the Black and Scholes equation (6.1). To 
finish the proof, we need to show that the boundary condition is also fulfilled. 
Indeed, 

C(T, x) = e-^ T ~ T) Ef (z£*) - Ef (e lo ^ x ) = E/(x) = /(x) 


which is what we needed. 
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6.2.1 European call and put prices 

The above calculation of the Black and Scholes price can be made exact in the 
case of the European call. 

Theorem 6.2.2 Let fix) — max(0, x — K), the payoff function of the European 
call with strike price K. Then, the solution of 

C(t,x) = e“ r(r “ 0 E/(Z^). 


is 

P t = C(f, S,) = s t <s>(di) - e- r{J - ,) K^(d 1 ) 


with 


d\ — d2 + <Js]T — t 


di 


In | + (r - \o 2 ) (T - t) 
a \JT — t 


Proof. Let us rewrite n ( Bt — B t ) as 


o(B t - B t ) = cty/T - t ■ Y 


with Y — ( B/ — B,)!«JT — l ~ N(0, 1). In order to calculate Ef(Z' r x ) we 
rewrite the quantity of interest as follows: 


E jmax(0, Z ( f x — A')} = E jmax ^0, e lnZ f ^ j 


= E 1 max I 0, e 


\x+{r-\a 2 ^(T-t)+aJT=iY 


K 


The payoff is zero if Z l j x is lower than the strike price K and hence the expected 
value above will be zero as well. Let us restrict the calculation of the expected 
value to those trajectories such that we have a positive payoff. We have that 

\nx+(r-l,a 2 )(T-t)+a^/T-t-Y 

max I 0, e \ 2 ) 



if 


lnx+^r— ^cr 2 J(r— t)+a —t-Y 


< K =e l °z K 


or, better, if 


log* + 



IT -t) + a\/T — t ■ Y < log K. 
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Therefore, we have 


Y < 


log K - log x - (r - ±a 2 ) (T - t ) 


a-jT — t 

which we can rewrite as Y < —d 2 

in f + (r - \a 2 ) (T - t) 


d 2 


rs/T^l 


Thus 


E{max(0 ,Z t f-K)}= E{71 {y> _ d2} } 

_ j ^ln x+(r— {T-t)+a-JT—t-y _ 

" L 


(p(y)dy 


’—d2 
= xe‘ 


i • OO 2 /T’ 

r(r-o / 


/ OO 

0 (y)dy 

-d 2 

where 0 (v) is the density function of the standard Gaussian random variable, i.e. 

— y 2 /— 

(f>(y) = e 2 /V27T. By symmetry of the Gaussian density, we have that 


/ OO 

1 

-d 2 


<t>{y)dy = P(7 > -d 2 ) = P(F < 0) = <J>(</ 2 ) 


and then 


/OO a 2 (T-t) _ 

e -2— +CT ' /r “°>(y)dy - K<P(d 2 ). 

-d 2 

We now change the variable of integration in the first integral as z = y — 
a \JT — t 


f°° g -e^W-o y0OOd) , 
J-d 2 


and obtain 

poo 


L 


e -^H +a ^i( z+a ^T=i) _J_ e - (z+cr f~> 2 d . 

—d.2—0\jT—t s/lli 

. r _}_ e -\a 2 (T-t)+a^T^iz+a' 1 {T-t)-\z 2 -\a 2 (T-t)-za^T=i^ z 

J— d2 —o s/T—t *j2n 


- 1 7 2 


/C^ p 2 ^ 

—=dz 

-L 


I — d2~0\jT—t >/27T 
= F(F > -rf 2 - cry/T-t) = P(Y <d 2 + osjT - t) = <D(di) 
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with d i = d 2 + +fr sjT — l. Putting all steps together we obtain 
E{max(0, Zlj x - AT)} = xe r{T f e loHT-t)+o(T-,) y(j){y)Ay _ K<t>idi) 

J-d 2 

= xe r(T ~ n ^(di) - K<t>(d 2 ). 

Therefore, noting that 

C(t, x) = e“ r(r “ ?) E/(0, Z’’ X (T)) 
we obtain the statement of the theorem 

C(t, x) = x<S>{d x ) - e- r(T ~ n K<i>(d 2 ). 

Similar calculations can be obtained for the price of the European put option 

Theorem 6.2.3 Let f(x') = max(0, K — x), the payoff function of the European 
put with strike price K. Then, the solution of 

C(t,x ) = e- r(r -°E f(Z , f x ). 

is 

P, = C(t , S t ) = e- r(T - ,> K<&(-d 2 ) - (6.10) 


with 


d\ — d 2 + o\jT — t 
A _ln ^ + (r-W)(T-t) 


The proof follows exactly the same steps as the one for the price of the call and 
it is omitted. The price of the put option can be obtained also via the put-call 
parity. 

6.2.2 Put-call parity 

Let us consider a portfolio n with only puts and calls on the same underlying 
asset and the asset itself. Assume that the put and the call have the same maturity 
date T and strike price K. Let us denote by P and C the value of the put and 
call respectively. We short sell a call, so we put a minus in our portfolio 


n, = S, + P, - C t . 

The value of n at time T is a function of the payoff of the two options. Remember 
that the payoff of a call is / (x) — ma x(S(T) — K. 0) while the payoff of the put 






EUROPEAN OPTION PRICING 


239 


is f(x) — max( K — S(T), 0). Then, the final value of the portfolio will be 

H t = S T + (K - S T )-0 = K, if K>S t , 
n r = S T +0- (S T - K) = K, if K < S T . 

In this case, the final value of the portfolio is always K which is deterministic 
and we can make arbitrage out of it if we don’t set some constraints on the value 
of the put P t . So, we can discount this final value K and construct a proper 
hedging portfolio whose value at time t — 0 is Hq and impose the non-arbitrage 
condition 

Ke~ rT = So + P Q - C 0 . 

Therefore, given the price Co from Theorem 6.2.2 we obtain the value of Pq by 
the put-call parity formula 


P 0 = Co - So + Ke~ rT . 


In general we have that 


P, = C, - S t + Ke~ r(T ~'\ 0 <t<T. (6.11) 


Exercise 6.1 Prove that for the put option P, — e r(/ K <t>(—c/ 2 ) — S t <t>(—d\) 

as in Theorem 6.2.3 using the put-call parity formula (6.11). 

6.2.3 Option pricing with R 

It is quite easy to implement put and call prices with R. The next code illustrates 
the idea for the call option 

R> call.price <- functionfx = 1, t = 0, T = 1, r = 1, sigma - 1, 

+ K = 1) { 

+ d2 <- (log(x/K) + (r - 0.5 * sigma y '2) * (T - t))/(sigma * 

+ sqrt(T - t)) 

+ <31 <- <32 + sigma * sqrt(T - t) 

+ x * pnorm(dl) - K * exp(-r * (T - t)) * pnorm(d2) 

+ } 

and for the put option 

R> put.price <- functionfx =1, t = 0, T = 1, r = 1, sigma = 1, 

+ K = 1) { 

+ d2 <- (log(x/K) + (r - 0.5 * sigma*2) * (T - t))/(sigma * 

+ sqrt(T - t)) 

+ dl <- d2 + sigma * sqrt(T - t) 

+ K * exp(-r * (T - t)) * pnorm(-d2) - x * pnorm(-dl) 

+ } 


240 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


We can now calculate the price of a constract with So = 100, strike price K — 
110, interest rate r — 0.05 with maturity 3 months. In this case T — 1/4, i.e. one 
fourth of the year, if we consider daily data. We assume a volatility of a = 0.25. 


R> SO <- 100 
R> K <- 110 
R> r <- 0.05 
R> T <- 1/4 
R> sigma <- 0.25 

R> C <- call .price (x = SO, t = 0, T = T, r = r, K = K, 
sigma = sigma) 

R> C 

[1] 1.980506 

and for the price of the put 

R> P <- put .price (x = SO, t = 0, T = T, r = r, K = K, 
sigma = sigma) 

R> P 

[1] 10.61406 

and check the put-call parity formula 

R> C - SO + K * exp (-r * T) 

[1] 10.61406 

Another solution is to use the fOptions package from the Rmetrics suite (see 
Appendix B.1.1). We have to use the function GBSOption from the fOptions 
which calculates several exact formulas for options of the Generalized Black and 
Scholes model 

R> require(fOptions) 


For the call option we use the call price we need to write 

R> GBSOption (TypeFlag = "c", S = SO, X = K, Time = T, r = r, 
b = r, 

+ sigma = sigma) 

Title: 

Black Scholes Option Valuation 
Call: 

GBSOption(TypeFlag = "c", S = SO, X = K, Time = T, r = r, b = r, 
sigma = sigma) 

Parameters: 

Value: 

TypeFlag c 
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s 

100 

X 

110 

Time 

0.25 

r 

0.05 

b 

0.05 

sigma 

0.25 


Option Price: 

1.980509 

Description: 

Wed Nov 17 00:08:45 2010 

Notice that the function produces extensive output. If we just want a numeric 
value we need to access the slot price as follows: 

R> GBSOption(TypeFlag = "c", S = SO, X = K, Time = T, r = r, 
b = r, 

+ sigma = sigma)Sprice 

[1] 1.980509 

Note further that the generalized Black and Scholes formula includes an 
additional parameter b which is the cost of carry. In order to obtain the standard 
formulas one has to put b = r, as in our examples. For the put options, we need 
to change the argument TypeFlag from c to p 

R> GBSOption(TypeFlag = "p", S = SO, X = K, Time = T, r = r, 
b = r, 

+ sigma = sigma)Sprice 

[1] 10.61407 

We can make a final remark about the formula of the price of the European call 
option. For the formula 


P, = C(t, S t ) = S t O(d!) - e~ r(T - ,) K^(d 2 ) 

we see immediately that, if we add the strike price K to the notation of P t , i.e. 
P t K — C(t, S t , K ), we have that 

aP t K = C(I, oS t , aK). 

Indeed, we can see it also numerically 

R> a <- 5 

R> a * GBSOption(TypeFlag = "c", S = SO, X = K, Time - T, r - r, 

+ b = r, sigma = sigma)@price 


[1] 9.902546 
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R> GBSOption(TypeFlag = "c", S = a * SO, X = a * K, Time = T, 
r = r, 

+ b = r, sigma = sigma)Oprice 
[1] 9.902546 

6.2.4 The Monte Carlo approach 

When explicit formulas for the calculations of the option price do not exist, it 
is still possible to rely on the Monte Carlo method of Section 4.1 given that the 
general price formula is given in the following form: 

C(t,x) = e- r(T ~ t) Ef(Z t /). 

In order to evaluate the option price we need to be able to simulate independent 
copies of the random variable Z‘j. x which is extremely simple due to the fact that 

Z'j x = x exp | — — er 2 ^ (T — t) + a\/T — tu 

with u ~ N(0, 1). So we need to simulate M copies of the the random variable 
u and apply the transform above to obtain the value of Zj X . To each value of 
the simulated Z'j x we apply the payoff function and finally we calculate the 
average of these values and discount by the factor e~ r ^ T ~ t \ The next code 
implements the above algorithm in the function MCPrice assuming that m is the 
number of Monte Carlo replications and f a generic payoff function. To speed 
up the simulation, we use antithetic sampling, because the simulation involves 
the simulation of symmetric random variables. 

R> MCPrice <- function(x = 1, t = 0, T = 1, r = 1, sigma - 1, 

M = 1000, 

+ f) { 

+ h <- function(m) { 

+ u <- rnorm(m/2) 

+ tmp <- c(x * exp((r - 0.5 * sigma*2) * (T - t) +sigma * 

+ sqrt (T - t) * u), x * exp ((r - 0.5 * sigma / '2) * (T - 

+ t) + sigma * sqrt(T - t) * (-u))) 

+ mean(sapply(tmp, function(xx) f(xx))) 

+ } 

+ P <~ h(M) 

+ p * exp(-r * (T - t)) 

+ } 

We now compare the Monte Carlo estimate of the price and the exact price of 
a European call option. 


R> SO <- 100 
R> K <- 110 
R> r <- 0.05 
R> T <- 1/4 
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R> sigma <- 0.25 

R> GBSOption(TypeFlag = "c", S = SO, X = K, Time = T, r = r, 
b = r, 

+ sigma = sigma)Sprice 
[1] 1.980509 

R> f <- function(x) max(0, x - K) 

R> set.seed(123) 

R> M <- 1000 

R> MCPrice (x = SO, t = 0, T = T, r = r, sigma, M = M, f = f) 

[1] 1.872703 

R> set.seed(123) 

R> M <- 50000 

R> MCPrice (x = SO, t - 0, T = T, r = r, sigma, M = M, f = f) 

[1] 1.991576 

R> set.seed(123) 

R> M <- le+06 

R> MCPrice (x = SO, t = 0, T = T, r = r, sigma, M = M, f = f) 

[1] 1.984967 

As seen from previous example, the Monte Carlo approach is not very precise 
in the calculation unless we greatly increase with the number of simulations as 
any Monte Carlo estimate. The advantage of the Monte Carlo method is that it 
can be easily parallelized and its precision is not affected by the dimensionality 
of the problem but only by the number of replications. The next code prepares 
a parallelized version of the previous MCPrice function which simply splits the 
task of simulation of the m random variables between the nodes of the cluster. 
We make use of the package foreach to distribute the task between the nodes 
of a cluster. The foreach package assumes that a cluster is already in place, 
if not, the code is executed sequentially. We modify our MCPrice function to 
work with the dopar operator from the foreach package 

R> MCPrice <- function(x = 1, t = 0, T = 1, r = 1, sigma = 1, 

M = 1000, 

+ f) { 

+ require(foreach) 

+ h <- function (m) { 

+ u <- rnorm(m/2) 

+ tmp <- c(x * exp ((r - 0.5 * sigma*2) * (T - t) +sigma * 

+ sqrt(T - t) * u), x * exp((r - 0.5 * sigma^S) * (T - 

+ t) + sigma * sqrt(T - t) * (-u))) 

+ mean(sapply (tmp, function(xx) f(xx))) 

} 

nodes <- getDoParWorkers() 

p <- foreach (m = rep (M/nodes, nodes),. combine = "c") %dopar% 


+ 
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+ 

+ 

+ 


h (m) 

p <- mean (p) 
p * exp(-r * (T - t)) 


+ } 


Notice that we added the foreach command to distribute the simulations and 
the command mean(p) because, in general, the result will be a vector (one entry 
per node of the cluster). Before executing the new version of the parallelized 
code, we need to adjust the definition of the payoff function f which requires 
the parameter k. This parameter, is not passed to the nodes of the cluster, 
because it is not visible in the code inside the function MCPrice. So we redefine 
the function and then call MCPrice again 

R> f <- function(x) max(0, x - 110) 

R> set.seed(123) 

R> M <- 50000 

R> MCPrice (x = SO, t = 0, T - T, r = r, sigma, M = M, f = f) 

[1] 1.991576 

We do not give here too much detail on how to set up a cluster with R because 
an extensive explanation is given in Section A.6 and we not mean to be efficient 
in the code above, but just show that it is a feasible approach. In our example 
below, we will set up a cluster using the snowfall package, but a similar cluster 
can be set up using the multicore package. Now, to test this parallelized version 
we set up a cluster of two cpus (assuming, e.g., a multicore processor) and assign 
it to an object cl and start up a random number generator for parallelized tasks 

R> require(snowfall) 

R> sflnit(parallel = TRUE, cpus = 2) 

R Version: R version 2.10.1 (2009-12-14) 

R> cl <- sfGetClusterf) 

R> clusterSetupRNG(cl, seed = rep(123, 2)) 

[1] "RNGstream" 

next we inform the foreach package that it can make use of the snow cluster 

R> require(foreach) 

R> require(doSNOW) 

R> registerDoSNOW(cl) 

Just to be sure that everything is working correctly, we check how many nodes 
are there in our cluster using the command getDoParworkers 

R> getDoParWorkers() 


[ 1 ] 2 
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Now we are ready to call MCPrice, but this time the execution will use the nodes 
of the cluster 

R> M <- 50000 

R> MCPrice (x = SO, t = 0, T = T, r = r, sigma, M = M, f = f) 

[1] 1.992274 

When finished with our simulations, we should not forget to stop the clus¬ 
ter with sfstop, but before doing that we give an empirical proof about the 
speed of convergence of the naive Monte Carlo method. The experiment is as 
follows: we make the number m of Monte Carlo replications vary in the set 
m = (10, 50, 100, 150, 200, 250, 500, 1000). For each value of M=m, we replicate 
the experiment repi = 100 times. In this way, we can evaluate the empirical 
variability of the Monte Carlo estimate and compare with the theoretical con¬ 
fidence bands. The result is shown in Figure 6.2. The next code executes the 
double Monte Carlo runs 

R> set.seed(123) 

R> m <- c(10, 50, 100, 150, 200, 250, 500, 1000) 

R> pi <- NULL 
R> err <- NULL 
R> nM <- length(m) 

R> repl <- 100 

R> mat <- matrix(, repl, nM) 

R> for (k in l:nM) { 

+ tmp <- numeric(repl) 

+ for (i in l:repl) tmp[i] <- MCPrice (x = SO, t = 0, T = T, 

+ r - r, sigma, M - m[k], f = f) 

+ mat[, k] <- tmp 
+ pi <- c(pl, mean (tmp)) 

+ err <- c(err, sd(tmp)) 

+ } 

R> colnames(mat) <- m 

The next portion of code is less interesting because it is only about the graphical 
representation of the results of the experiment, so we present it without additional 
comments. 

R> pO <- GBSOption (TypeFlag = "c", S = S0, X = K, Time = T, r = r, 

+ b = r, sigma = sigma)Sprice 
R> minP <- min(pi - err) 

R> maxP <- max(pi + err) 

R> plot(m, pi, type - "n", ylim = cfminP, maxP), axes - F, 
ylab = "MC price", 

+ xlab = "MC replications") 

R> lines(m, pi + err, col - "blue") 

R> lines (m, pi - err, col = "blue") 

R> axis(2, pO, "B&S price") 

R> axi s (1, m) 

R> boxplot(mat, add = TRUE, at = m, boxwex = 15, col = 


orange", 
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+ axes = F) 

R> points(m, pi, col = "blue", Iwd = 3, lty = 3) 

R> abline(h = pO, lty = 2, col = "red", Iwd = 3) 

Finally, we close the cluster with sfstop and inform the foreach package 
that the cluster is no longer available using the command registerDoSEQ. 

R> sfStop () 

R> registerDoSEQ() 


6.2.5 Sensitivity of price to parameters 

We now analyze how the theoretical price reacts as a function of the parameters 
of the contract. For simplicity we consider the price of the European options 
only. The results which we obtain from the formula should be consistent from 
what we expect from the price of the option. We have seen that the price of a 
European call is available in the following explicit form: 

P, = S l ^(dO-e- r(T - t) K<t>(d 2 ) 


with 


d\ — d2 + a\jT — t 


di 


In | + (r - \_o 2 ) (T - t ) 
o \/T — t 


while for the put it reads as 

P, = e“ r(7 '“ 0 ^O(-r/ 2 ) - S t <i>{-di). 



10 100 200 500 1000 

MC replications 


Figure 6.2 The speed of convergence of the naive Monte Carlo price toward the 
true Black and Scholes price. 
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Note that the price P t is a function of the interest rate but not of the drift 
ji of the underlying geometric Brownian motion. And it is also a function of 
the underlying asset S t , the strike price K, the volatility a and the time T. 
We will now discuss individually the effect of each of these parameters on the 
price P,. 

6.2.5.1 Sensitivity to the value of underlying asset S t 

Now, consider all the rest of the parameters as fixed and assume that, at a given 
instant t, the value S, is much higher than the exercise price K. The ratio S, / K 
will be very large and thus both d 2 and d\ will be large as well. In this case 
( \Hd\) ~ TP:/?) — 1, so the price of call option becomes 

S t — e~ r(T ~ ,) K 

which means, the value of the underlying asset S, is so large that we will 
probably exercise the option and thus we have a nonrandom payoff. So the 
fair price of the option is just the different between the present value and the 
discounted strike price K. Conversely, for the put we have <t>(— d\) ~ Of—c/?) 
~ 0, therefore the price of the option will be zero. Indeed, we will almost surely 
not exercise the option. 

6.2.5.2 Sensitivity to the volatility a 

Suppose now that the volatility a is zero, all the rest being fixed. The option 
becomes a nonrisky contract, thus the value of the option at time T is nothing 
but Soe rT . Exactly like a bond. Then, the payoff of the call will be 

ma\(Soe rT — K, 0) 

which, discounted at time t = 0, becomes 

e~ rT max(Soe' r — K , 0) = max(So — Ke~' T , 0). 

The same conclusion can be derived directly from the Black and Scholes formula. 
Indeed, suppose that St > K i.e. S 0 > e~ rT K, then log(S 0 /.K) + rT > 0. As a 
converges to 0, we have that both d\ and dj diverge and then ( t>(d\) ~ Pidi) —> 1. 
Therefore, the final payoff will be So — Ke~ rT . Conversely, assume that 5/ < K. 
In this case we have that log(So/W) + rT <0 and then d \, d 2 —> — oo as a —> 0. 
Consequently, 4>(k?i) ~ TUT) -» 0 and thus the payoff is zero. 

6.2.5.3 Sensitivity to the strike price K 

Figure 6.3 shows that, as expected, the price of the call option is a decreasing 
function of the strike price K . The plot represents the price of a call option with 
S 0 = 100, T = 100, r = 0.01 given, and presents P, as a function of both K and 
the volatility a. The following code has been used to produce Figure 6.3. 
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Figure 6.3 The sensitivity of the price of a call option with respect to the strike 
price K. 


R> SO <- 100 
R> r <- 0.01 
R> T <- 100 

R> p <- function(sigma) call.price(x = SO, t = 0, T = T, r 
+ K = K, sigma = sigma) 

R> K <- 80 

R> curve(p, 0, 1, xlab = expression(sigma), 
ylab = expression(P[t])) 

R> K <- 100 


R> 

R> 

curve (p, 0, 
K <- 150 

1 , 

add 

= TRUE , 

lty = 

2) 

R> 

curve (p, 0, 

1 , 

add 

= TRUE , 

lty = 

3) 

R> 

legend(0.5, 

90, 

■ of 

'K=80 ", 

"K=100 

", "K=150") , lty = 1:3) 


r, 


6.2.5.4 Sensitivity to the expiry time T 

Figure 6.3 shows that the price of the call option is a increasing function of 
the expiry date T. The plot represents the price of a call option with 5b = 100, 
K = 100, r = 0.01 given, and presents P, as a function of both T and the 
volatility cr. The following code has been used to produce Figure 6.4. 


R> SO <- 100 
R> r <- 0.01 
R> K <- 100 
R> T <- 10 

R> curve(p, 0, 1, xlab = expression(sigma), 
ylab = expression(P[t]), 


+ 

ylim = 

= c(0. 

100) ) 



R> 

T <- 50 






R> 

curve (p, 

0 , 

1, 

add = TRUE, 

lty = 

2) 

R> 

T <- 100 






R> 

curve (p. 

0 , 

1, 

add = TRUE, 

lty = 

3) 

R> 

legend(0. 

5, 

40, 

c(" T =10", 

"T=50 ". 

, " 


1:3) 
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Figure 6.4 The sensitivity of the price of a call option with respect to the strike 
price K. 


6.3 The 8 -hedging and the Greeks 

We have seen that the hedging portfolio (a 11 , h") is a key ingredient of the Black 
and Schoies model. Remember also that the model derives an explicit formula 
of both a H and b H . In particular, 

it d 

a H (t) = — C(t, S t ) 


which can be rewritten as 


a H (t) = 


dS t 


P, 


as a derivative, it can be interpreted as the sensitivity of the option price P, with 
respect to the value of the underlying asset S t . This derivative is also called 8 
and the corresponding portfolio strategy is called S-hedging. The quantity 8 is 
also one of the Greeks of the option. The attribute ‘Greek’ is only due to the 
fact that originally Greek letters were used to define these quantities. We will 
consider other Greeks at the end of this section. Here we just analyze the 8 
and, in particular, we consider the problem of evaluation of this Greek which is 
straightforwardly applicable to all the other Greeks. 

The first thing we note is that the Greek 8 is a function of time, i.e. 8 — S(t) — 
a H (t). This means that the hedging portfolio should be adapted continuously with 
time. This is practically impossible for several good reasons, including transaction 
costs (i.e. continuous adjusting will have a total transaction cost equal to oc!) 
This means that completeness cannot be attained at any instant. 

We now derive the explicit formula of the 8 in the case of the European call. 
The hedging portfolio ( a H ,b H ) is such that a H (t) = dC(l, x)/d x evaluated at 
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x — S t . We can calculate the derivative as follows: 

dC( ! d X) = ^-e- r(T - () E = e- r(r - f) E j f'(Z t f x )Z t T l 1 . 

The exchange of integration and derivation above is admissible because the vari¬ 
able x is just a scaling factor. Indeed, 

Z'j = x exp | — -cr 2 ^ (T — t) + o(Bt — B ,) 

so the law of Zj depends essentially on the increments of the Brownian motion. 
Further, Z'j 1 is nothing but d/d x f(Z'j x ). In fact, we have 

y f(Zj X ) = yx exp | - ^<r 2 ^ (T — t) + a(B T - B t ) 

= 1 ■ exp | - ~rr 2 ^ (T — t) + a(B T - B,) 

= Z'j 1 . 

The critical point is now the derivative of the payoff function. 

Theorem 6.3.1 Let fix) — max(0, x — K) be the payoff function of the Euro¬ 
pean call option. Then 

a H (t ) = <5 = O(rfi) 


with d\ from Theorem 6.2.2. 

Proof. We have that fix) = max(0, x — K), therefore 


fix) = 


I 1, x> K 
I 0, x < K 


and hence 
dC{t,x) 


= e“ r(7 ’“' ) E {f'(Zj X )Z t ’ 1 (T)} 


= e- r(T - t) ^\z t i 1 l {Z ,, >K] \ 

= e~ r ^ T ~ t) J exp J (r - l -o 2 ^j {T - t) + oy/T - ty 

= J exp |-ior 2 (r - t) + os/T - ty <p(y)dy 


fiy)dy 


with y ~ N(0, 1) and <p(-) the density of the same Gaussian distribution. As 
in the proof of Theorem 6.2.2, after the change of variable z — y — oy/T — t 
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we get 


dC(t,x) 



4>(z)dz — 1 - - os/T - t ) 


= I - <&(-*) = $(£/!). 


Previous results say that the hedging portfolio is such that the 8 is a value in [0, 1] 
given that it is just the cumulative distribution function of a Gaussian random 
variable evaluated at a particular point 

0 < a H (t) = 8 = <D(t/i) < 1. 

The 8 is then also called the hedge ratio for this reason, even though it is not 
strictly a ratio. Notice also that the 8 is the quantity of underlying assets we have 
to include in the hedging portfolio, which should amount at least to one unit. In 
this view, the hedge ratio is interpreted as a proportion but in practice this implies 
that in the Black and Scholes model all the quantities involved are infinitely 
divisible even though we didn’t mention this assumption explicitly earlier. 

We finally note that the <5 of a put option is given by the formula 

which implies that while the 8 of the call is always non-negative, the 8 of the 
European put option is always negative. This is also intuitive in that, if the price of 
underlying asset increases, the 8 of the call increases, which means that the hedg¬ 
ing portfolio must include more assets than bonds. Vice versa for the put option. 

6.3.1 The hedge ratio as a function of time 

It is also interesting to see how the 8 varies along the duration of the contract. 
Let us consider a European option with T — 100, K = 100, and let r = 0.01, 
o' = 0.05. Figure 6.5 represents the value of the hedge ratio a H (t) as a function 
of time for different values of the underlying asset S t . The plot shows that, as the 
time approaches maturity t — 99.5, and the underlying assets S, approaches the 
strike price K, the hedging portfolio should contain essentially only assets rather 
than bonds because 5 is almost equal to 1. Conversely if the underlying asset S, 
is smaller than K. In this case, the hedging portfolio should contain only bonds, 
i.e. 8 ~ 0. For all times before maturity t, it is also true that that if the value of 
the underlying asset is much higher than the strike price K , the 8 approaches 1 
and vice versa if S t is much lower than K. 

R> delta <- function(x) { 

+ d2 <- (log(x/K) + (r - 0.5 * sigma''2) * 

+ sqrt(T - t)) 

+ dl <- d2 + sigma * sqrt(T - t) 


(T - t))/(sigma 
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+ pnorm(dl) 

+ } 

R> r <- 0.01 
R> K <- 100 
R> T <- 100 
R> sigma <- 0.05 
R> t <- 1 

R> curve(delta, 0, 200, xlab = expression(S[t] ), 
ylab = expression(delta = a*H( t))) 


R> 

R> 

t <- 50 
curve(delta. 

0, 

200, 

lty 

= 2 , 

add 

= TRUE) 

R> 

R> 

t <- 99.5 
curve(delta. 

0, 

200, 

Ity 

= 3, 

add 

= TRUE) 

R> 

legend(150, 

0.6, 

c(' 

n 

ll 

1-4 

rt 

II 

Ol 

o 

"t=99.5 


6.3.2 Hedging of generic options 

There are other cases in which the payoff function of the claim is not dif¬ 
ferentiable or when its derivative is meaningless. For example, the digital or 
binary option option is a contract which pays a constant amount, say c, if 
Sj > K and zero otherwise. In this case, the payoff function can be written 
as f(x ) = cl {x > K [ but its derivative is zero for all values of x different from 
K , which is a discontinuity point. This means, that we cannot apply the previous 
formula, because we obtain 

S = a H (t) = 3C{t ’ x) = e - r(T - t) E\f(Z , /)Z t T 1 } = 0 . 

d x I 1 

This clearly implies an unrealistic allocation of the portfolio with only bonds and 
no assets. We now present another widely used method to overcome the problem 
with differentiation of the payoff function, which is the density method. 



Figure 6.5 The sensitivity of the S of a call option with respect to time t. 












6.3.3 The density method 
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Theorem 6.3.2 Let f be the payoff function of a contingent claim. Then 
C x (t, x) — e~ r ^ T ~^E{g(t,x, Zj X )} 

with 


g(t,x,z ) = f(z) 


log § - (r - la 2 ) (T-t) 


xa 2 {T — t) 


Notice that in the above formula, the payoff function /(■) now appears as is, 
without the need of even the existence of the first derivative. The proof is very 
elementary. 

Proof. As before, let us define 

Y = log x + (V - (T - t) + o(B t - B t ), Z lx = exp(F). 

Thus 


Y ~ N (^logx +{r- (T - t ), a\T - t) j . 

Denote by p Y (y; x) the density of the random variable Y, keeping the explicit 
dependence on x in our notation. Then 

C(t, x) = e - r(r -' ) E / (Zj X ) = e- r(r - f) J f(e y )p Y (y x)dy. 

Now we proceed with differentiation 

C x (t,x) = -^-C(t,x) = e~ r(T ~ n f f(e y )-^-p Y (y,x)dy 
dx J dx 

- r(T -t) f ,, Yx d , s y-\ogx-(r -\cr 2 )(T -t) 

= e / f(e y )— p Y (y; x) - ^ -dy 


dx 


xa 2 (T — t ) 


nzy'LiZl -ios^-0-M v -o 


xcr 2 (T — t) 


= e -^-*)E {g{t,x,zY)} 


because Y = log Zf x and 


p Y (y, x) = 


y/2ncr 2 (T — t) 


exp 


(y-(\ogx + (r-W){T-t))f 
2 o 2 (T -1) 
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6.3.4 The numerical approximation 

Because C x (t, x) a derivative, it is possible to evaluate the <5 using the incremental 
ratio as seen in Section 4.2. Indeed, we can evaluate the derivate as 


— C(t, x ) = lim 
dx h —>-0 


C(t, x + h ) — C(f, x) 


which we can approximate numerically as follows: 


d C(t, x + h) - C(t, x) 

—C(r, x) --- 

ox n 

with very small h. For example, 


R> r <- 0.01 
R> K <- 100 
R> T <- 100 
R> sigma <- 0.05 
R> t <- 10 
R> St <- 70 
R> h <- 0.01 

R> delta.num <- function(x) (call.price(x = x + h, t = t, T = T, 
+ sigma - sigma, r = r, K = K) - call.price(x = x, t = t, 

T = T, 

+ sigma = sigma, r = r, K = K) ) /h 

R> delta(St) 


[1] 0.9166063 


R> delta.num(St) 


[1] 0.9166294 


Another option is to use the centered derivative which is better than the simple 
incremental ratio 


3 C(t, x + h) — C(t, x — h) 

— C(f, x) ~-- 

dx 2 h 


R> delta.num2 <- function (x) (call.price (x = x + h, t = t, T = T, 
+ sigma = sigma, r - r, K = K) - call.price (x = x - h, t = t, 

+ T = T, sigma = sigma, r = r, K = K) )/ (2 * h) 

R> delta(St) 

[1] 0.9166063 

R> delta.num2(St) 


[1] 0.9166063 
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If we decrease h, the approximation converges to the true value quickly also for 
the simple approximation as shown in the next numerical example. 

R> delta (St) 

[1] 0.9166063 

R> h <- 0. 001 
R> delta.num(St) 

[1] 0.9166086 

R> h <- le-04 
R> delta.num(St) 

[1] 0.9166066 

R> h <- le-05 
R> delta.num (St) 

[1] 0.9166063 


6.3.5 The Monte Carlo approach 

When the formula of the price is not available in explicit form because, 
for example, the underlying process is not a geometric Brownian motion, the 
Monte Carlo method is useful in the evaluation of the Greeks. As usual, being 
a Monte Carlo method, it is always necessary to take into consideration the 
speed of convergence of the method. The Monte Carlo method can be used 
to calculate the S directly using the density method or inside the numerical 
differentiation. We first consider the density method approach because it 
involves a simple modification of the formula of the option price. In this case, 
the S is given by direct evaluation of the expected value 


with 


8 = e- r ^- t) ng(t,x,Z t /)} 


g(t,x, z ) = f{z) 


tog f - (r - W) (T - t) 
xcr 2 (T — t) 


and we already know that, in order to generate the random variable Zj , we just 
need to generate Gaussian random variables and then apply the proper transform. 


R> MCdelta <- function(x = 1 , t = 0, T = 1, r = 1, sigma = 1, 

M = 1000, 

+ f) { 

+ h <- function(m) { 

+ u <- rnorm(M/2) 

+ tmp <- c(x * exp((r-0.5 * sigma*2) * (T - t) + sigma * 

+ sqrt(T - t) * u), x * exp ((r - 0.5 * sigma*2) * (T - 
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+ t) + sigma * sqrt(T - t) * (-u))) 

+ g <- function (z) f(z) * (log(z/x) - (r - 0.5 * sigma*2) * 

+ (T - t))/(x * sigma*2 * (T - t) ) 

+ mean(sapply(tmp, function(z) g(z))) 

+ } 

+ nodes <- getDoParWorkers() 

+ p <- foreachfm = rep(M/nodes, nodes), .combine - "c") 

%dopar% 

+ h (m) 

+ p <- mean (p) 

+ p * exp(-r * (T - t)) 

+ } 

We now test the new function MCdeita 

R> r <- 0.01 
R> K <- 100 
R> T <- 100 
R> t <- 10 
R> sigma <- 0.05 
R> SO <- 70 

R> f <- function(x) max(0, x - 100) 

R> delta (SO) 

[1] 0.9166063 

R> set.seed(123) 

R> M <- 10000 

R> MCdeita (x = SO, t = 0, T = T, r - r, sigma, M = M, f - f) 

[1] 0.9260659 


6.3.6 Mixing Monte Carlo and numerical approximation 

There are cases in which it is not possible to use the density method due to 
the fact that the underlying process is not the geometric Brownian motion. In 
this case, Monte Carlo is required to simulate paths of the underlying process 
but Monte Carlo alone is not enough. The idea is then to evaluate the derivative of 
Monte Carlo price path wisely. So for example, the following approach: 

R> r <- 0.01 
R> K <- 100 
R> T <- 100 
R> t <- 0 
R> sigma <- 0.05 
R> SO <- 70 
R> delta(SO) 

[1] 0.9378105 

R> h <- 0.001 

R> pi <- MCPrice (x = SO + h, t = 0, T = T, r = r, sigma, M = M, 
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+ f = f) 

R> p2 <- MCPrice (x = SO - h, t = 0, T = T, r = r, sigma, M = M, 

+ f = f) 

R> pi 

[1] 34.4279 

R> p2 

[1] 34.3844 

R> (pi - p2)/ (2 * h) 

[1] 21.74834 

is wrong because the two Monte Carlo prices C(0, So + h ) (pi) and C(0, .S’o) (p 2 ) 
are calculated on different trajectories. But, if set the same seed for the random 
number generator, we actually use the same trajectories. Indeed, we have 

R> delta(SO) 

[1] 0.9378105 

R> set.seed(123) 

R> pi <- MCPrice(x = SO + h, t = 0, T = T, r = r, sigma, M = M, 

+ f = f) 

R> set.seed (123) 

R> p2 <- MCPrice (x - SO - h, t - 0, T = T, r = r, sigma, M = M, 

+ f = f) 

R> pi 

[1] 34.26478 

R> p2 

[1] 34.26291 

R> (pi - p2)/(2 * h) 

[1] 0.9368775 

But, as mentioned, we are using the same trajectories, so there is no need to 
generate them twice and we can modify the code of the MCPrice as follows: 

R> MCdelta2 <- function(x = 1, t = 0, T = 1, r = 1, sigma = 1, 

M = 1000, 

+ f, dx = 0.001) { 

+ h <- function(m) { 

+ u <- rnorm(M/2) 

+ tmpl <- c ( (x + dx) * exp ((r - 0.5 * sigma^2) * (T - t) + 

+ sigma * sqrt(T - t) * u) , (x + dx) * exp((r - 0.5 * 

+ sigma / '2) * (T - t) + sigma * sqrt(T - t) * (-u))) 

+ tmp2 <- c((x - dx) * exp((r - 0.5 * sigma^2) * (T - t) + 
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+ sigma * sqrt(T - t) * u), (x - dx) * exp((r - 0.5 * 

+ sigma / '2) * (T - t) + sigma * sqrt(T - t) * (-u))) 

+ mean (sapply (tmpl, function(x) f(x)) - sapply 

(tmp2, function (x) f (x) ) ) / (2 * 

+ dx) 

+ } 

+ nodes <- getDoParWorkers() 

+ p <- foreachfm - rep(M/nodes, nodes), .combine = "c") 

%dopar% 

+ h (m) 

+ p <- mean (p) 

+ p * exp(-r * (T - t)) 

+ } 

which is faster but gives, clearly, the same result as in the above 

R> set.seed(123) 

R> MCdelta2 (x = SO, t - 0, T = T, r = r, sigma = sigma, f = f, 

M = M, 

+ dx = h) 

[1] 0.9368775 
R> delta (SO) 

[1] 0.9378105 


6.3.7 Other Greeks of options 

We have seen that the first greek S represents the sensitivity of the price of the 
derivative with respect to the variation of the underlying value of the asset. It is 
possible to study the sensitivity with respect to the other input of the formula. 


6.3.7.1 The Greek theta 


The Greek 6 {theta) represents the sensitivity of the price with respect to time or, 
better, how the price varies as a function of time. It is defined as the derivative of 
C(r, x) with respect to variable t. For the European call option it corresponds to 

theta = 0 = C t {t, x) — —rKe~ r(T ^^{dj) -< 0 

2 s/T - t 

where (/>{■) is the density of the standard Gaussian random variable. Notice that 
it is always a negative value, which implies that as time passes by, the value of 
the option tends to decrease. For the put option the corresponding formula is 


theta = 0 put = C t (t, x) — rKe r(T di) 


ax 

2VT - l 


<fi( d\) 


and this value is not necessarily negative, which means that the price of the put 
may also increase with time. 
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6.3.7.2 The Greek gamma 

Another interesting Greek is the so-called y {gamma) which is the second 
derivative of the option price with respect to argument x or, in other words, 
the derivative of the <5 with respect to x. So it represents the sensitivity of the 
hedge ratio with respect to the variation of the underlying asset. For the European 
call and put options it has the following form: 


gamma — y — C x 


3 1 

c (t,x) = —C x (t, x) = - — - 

ox ax\/T — t 


<t>(di) >0. 


In this case there is no difference between put and call options. 


6.3.7.3 The Greek rho 

The Greek p {rho) represents the sensitivity of the options price with respect to 
the interest rate. For the European call option has the following form: 

3 C{t,x) rtT 

rho — p — -—— = K{T - 0<J>(t/ 2 )e“ r( 0 > 0 

dr 

while for the European put it has the form: 

3 C{t,x) r(T 

rho = Pput = ^ = —K{T — r)<F(-J 2 ) e - r(r - f) < 0. 


6.3.7.4 The Greek kappa 

The Greek kappa represents the sensitivity of the option price with respect to 
the strike price K. For the European call option it has the form: 

dC{t,x) rlT 

kappa = k= ’ = e-'' (r - f) (d>(-d 2 ) - 1) < 0 

OA 

and for the put option: 

kappa = K put = dC ^ K X) = e~ nT ~ n <P{-d 2 ) > 0. 


6.3.7.5 The Greek vega 

The Greek vega represents the sensitivity of the option price with respect to the 
volatility. For the European call and put options it has the form: 

dC{t , x) ^ - 7 

vega =-= ivj — tcp{d\) > 0. 

da 

Clearly vega is not a letter of the Greek alphabet but all kind of derivatives 
of C{t,x) are called Greeks in mathematical finance. There are an enormous 
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amount of Greeks with funny names (volga, vanna, charm, speed, color) that can 
be defined. For a vast review one can see Haug (1997). A last remark about the 
Greeks is that they are fundamentally linked to the Black and Scholes equation 
(6.1). Indeed, we can rewrite 

rC(t, x) — C t {t, x ) + rxC x (t, x) + —cr 2 x 2 C xx (t, x) 

in terms of the Greeks as follows: 

rC(t, St) = 9t + tS,S t + 


6.3.8 Put and call Greeks with Rmetrics 

The package fOptions which we have already encountered, allows direct calcu¬ 
lations of the Greeks for put and call options via the function GBSCharacteris- 
tics. Here we give a self-explanatory example for the European call option. 

R> r <- 0.01 
R> K <- 100 
R> T <- 100 
R> t <- 10 
R> sigma <- 0.05 
R> SO <- 70 

R> GBSCharacteristics(TypeFlag = "c", S = SO, X = K, Time = T - 
+ t, r = r, b = r, sigma = sigma) 

$premium 
[1] 30.89978 

$delta 

[1] 0.9166063 
$theta 

[1] -0.360923 
$vega 

[1] 101.8671 
$rho 

[1] 2993.639 

$lambda 
[1] 2.076469 

$gamma 

[1] 0.004619824 

The function GBSCharacteristics calculates other parameters of interest (Haug 
1997) about which we do not comment here. Similar output can be obtained for 
the put option. 
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R> GBSCharacteristics(TypeFlag = "p", S = SO, X = K, Time = T - 
+ t, r = r, b = r, sigma = sigma) 

$premium 
[1] 1.556748 

$delta 

[1] -0.08339374 
$theta 

[1] 0.04564667 
$vega 

[1] 101.8671 
$rho 

[1] -665.4879 

$lambda 

[1] -3.749842 

$gamma 

[1] 0.004619824 


6.4 Pricing under the equivalent martingale measure 

In Section 6.1.2 we have seen that it is possible to price options as if they were 
nonrisky contracts by changing the measure in the expected value of the payoff 
X from the physical measure P to a risk-free measure Q. We now derive the 
same approach of option pricing in the continuous case under the Black and 
Scholes market. 

Definition 6.4.1 A probability measure Q on (£2, A) is called equivalent martin¬ 
gale measure if there exist a positive random variable Y > 0 such that 

Q{A) — E{1^F} = P(Y e A), VAC .4 

and the discounted price process Sf — e~ rt S t is a martingale with respect to Q. 

Contrary to the one-period market in which the measure Q transform quantities in 
risk-free financial activities, in the above definition Q transforms the discounted 
stock price into a martingale. Such a measure, when it exists, it is still called 
risk neutral measure. 

Theorem 6.4.2 Let Ael and define 


M t — exp 


—XBj — 
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Define further a new probability measure Q as 

Q(A) — E , A e A. 

Then, W, — B t + Xt, 0 < t < T , is a Brownian motion with respect to Q. 

Proof. To prove that W, is a Brownian motion under the new measure Q. 
we have to show that its characteristic function is the one of the random variable 
N(0, t) for each t. First, notice that Q(A) = E (1 4 Mj ) = f A P(do>). 

Therefore, we can rewrite integrals in this way Eg(X) = E (M T X). We now 
calculate the characteristic function of W r . 

cpwfiu) = Eg (e ,uW >) = E (. M T e iuWt ) 

= e iukt ~^ 2T E (exp {—XB t + iuB,}) 

— e iuXt ~ 3^ 2r E ( e -‘ l ‘( B T-Bt g ^ e (iu-l)B T ^ 

= e iuit-\x 2 T E (e~ iuY ^ E (e x ) 

with -Y ~ Y ~ N(0, T — t) and X ~ N(0, ( iu - X) 2 T). So E(exp{in(-F)}) is 
the characteristic function of the Gaussian random variable Y and E(cxp{ A [) 
is the mean of the log-Normal random variable X. If we replace the quantities 
we have 

<p Wt (u) = e i«M-\k l Te\<T-t)u 2 e \(.iu-V, 2T _ 

which is the characteristic function of the Gaussian random variable N(0, /). 

Clearly, M T is nothing but the Radon-Nikodym derivative in the Girsanov’s 
theorem 3.16.1 for W, against B,. Notice that W, is not a Brownian motion 
under the physical measure P. Indeed, 


E(VF r ) = E (B, + Xt) = Xt jL 0. 


On the other hand, M, — cxp(/. R, — \k 2 i } is a martingale under P. Indeed, 
M t — f(t, B, ) with f{t, x) = thus by direct application of Ito formula, 

we get 


1 , 1 , 

AM, = XM,AB, + -X 2 M,dt - -X 2 M,dt = XM,dB„ M 0 = 1 

or M, = MqX [‘ M S AB S which proves that M, is a martingale (see Section 3.13.1). 
Notice that M, is log-Normal distributed, so E|M r | = EM, < 00 . 

Exercise 6.2 Prove that Q defined in Theorem 6.4.2 is a probability measure. 

We can now state that the measure Q defined in Theorem 6.4.2 is the equivalent 
martingale measure we were aiming at. 
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Theorem 6.4.3 Let X — Then, Q from Theorem 6.4.2 is the equivalent 
martingale measure in the Black and Scholes model, i.e. Sf = e~ rt S t is a 
martingale. Further, Sf solves the following stochastic differential equation: 

d Sf = aSfdW,. 


Proof. We have to prove that Sf = e~ rt S, is a martingale under the new 
measure Q, because clearly P and Q are equivalent. So we start by writing the 
stochastic differential equation for Sf using Ito formula for fit, x) — e~ r 'x: 

d Sf = e~ rt dS, - re~ rt S,dt = e~ rt fiS r dt + e~ rt oS,dB, - re~ rt S,dt 

-dr) = crSfdW,. 


(/x - r)Sf + Sfa ( dW, - M 


Clearly S{j — So, thus we also have that 

Sf = S 0 + cr f SfdW u . 

Jo 

Finally, by the martingale representation theorem (see Section 3.13.1) and the 
fact that W, is a Brownian motion under the measure Q, we can conclude that 
Sf is a martingale. 

A further remark is that the process 


Zf x = x exp 



(s - t) + a ( B s - B t ) 


s >t, 


of Equation 6.7 is replaced by the new translated Brownian motion under the Q 
measure 


Sf x = x exp 



(s-t) + u(W, - W t ) 


s > t. 


Theorem 6.4.4 Let Q be the risk-neutral measure of Theorem 6.4.2 and 
X — (fi — r)/o. Then, the price of the contingent claim with payoff X — f(Sj) 
is given by 


C(r,x) = e- r(r - ;) E G {/(S^)} 


and the delta-hedge ratio is 


— C(t,x) — e r(T f) Eg {g(r, x, Sj '} 


g{t,x,s) = f(s) 


l°gf ~{r-\o 2 )(T-t) 
xcr~(T — r) 


with 
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The proof is the same as the one under the physical measure P, thus we do not 
replicate it. Apparently we did not gain too much with the introduction of the 
new measure Q. but it is not the case, in that we are now able to replicate the 
ideas presented in Section 6.1.2 for the one-period market. 

Exercise 6.3 Let X t — exp {fit + a B t }. Prove that X, is a martingale with respect 
to the natural filtration of the Brownian motion if and only if p — —\a 2 . 

6.4.1 Pricing of generic claims under the risk neutral measure 

Given a contingent claim with payoff X, there must exist a hedging portfolio 
H such that H(T) = X and, due to non-arbitrage requirements, H(t) = P(t). 
Due to the fact that Sf is a martingale and R, is as well, one can easily see that 
Hf = e~ rT H t is a martingale under the measure Q. Thus, it is also true that 

Hf = e~ rT H T = e~ rT X. 

For the martingale property of Hf we have that 


Hf = E Q {H$\F t } = E Q {e~ rT X\T t } 
and, by the non-arbitrage property P, = H t , we also have that 

Hf = e~ rt P t . 

Therefore, 

e~ rt P t = Hf = e~ rT E Q {X\F,} 

and thus 

P, = e~ r(T - ,) E Q {X\F t }. 

Finally, at time t — 0 

P 0 = c- r(r - 0) E e {Z|^ 0 } = e~ rT E Q (X). 

So, the fair price of the contingent claim at time t — 0 is nothing but the dis¬ 
counted value of the expected payoff, where expectation is calculated under the 
new measure Q. 

6.4.2 Arbitrage and equivalent martingale measure 

Notice that, not only S, becomes a martingale under Q (after discounting by the 
factor e~ rt ) but all related quantities behave as such and in particular the price 
of the option Pf = e~ rt P, is also a martingale (bonds are always martingale in 
this approach). The main effect of this new ‘martingale world’ is that arbitrage 
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is prevented by construction. Indeed, assume for a while that, there exists an 
arbitrage opportunity, i.e. there exists a portfolio strategy (a, b, c ) such that 

V 0 < 0, V T > 0 and E(V r ) > 0. 

From the fact that Vf — e~ r, V, is also a martingale under Q, we have that 
0 > Vo = e ' T ¥jQ{Vr) — e ' t ¥*q(Vt\Fo) 
or, better, E q(Vj) < 0 and hence Vj < 0. So, we have that 

2(V r >0) = 0. 

Given that P and Q are equivalent, we will also have that P (Vj > 0) = 0 
and hence P < Vj < 0) = 1 which makes our arbitrage assumption E( VV > 0) 
impossible! 

We have just shown that, if there exists an equivalent martingale measure, 
there are no arbitrage opportunities. It turns out that the contrary is also true, 
i.e. a market is arbitrage free if and only if there exists a martingale measure. 
In this simple framework it is possible to state that there is only one martingale 
measure, but in general (e.g. when the underlying process is not the geometric 
Brownian motion) there may be multiple martingale measures, say Q\, Q2, etc. 
In this case, Egj (X), E q 2 (X), etc., might all be different numbers, so a unique 
fair price may not exist. In this situation hedging is a bit questionable but still 
possible. We will consider a general statement of this fact in Chapter 6.7. 

6.5 More on numerical option pricing 

Remember that for a contingent claim with payoff X — /(St), we denote its 
price at time t, P, — C(t , S t ), with 

C(t,x) = e~ r ^-E Q {f(S t /)} 

and Q the equivalent martingale measure. The process S‘ u x is a translated 
geometric Brownian motion under Q with explicit expression 

S r T x = x exp | (V - ^cr 2 ^ (T -t) + o(W T -W t ) . 

As for the Zj X under P, in order to simulate Slj x we just need to simulate 
the increments of a standard Brownian motion. So, from the point of view of 
the Monte Carlo approach, there is essentially no difference because B, — B t ~ 
Wt — W, . Similarly for the delta hedge ratio S — af 1 

h 9 

a " - — C(t, S t ) 
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and bj' 

H = (C(t, S t ) - a?S,) 

' R, 

By Theorem 6.4.4 we have 

C x (t,x) = e~ r{T ~ n E Q {g(t,x, S'^ x )} 


with 


Notice that 


g(t,x,s) = f(s) 


ln£ —( r _ i ff 2) (T _ t) 

xa 2 (T — t) 


g(t, x, Sj X ) = / {Sj X ) 

= f(.s' T x ) 
= f(S r T x ) 


W T - w, 

xo(T — t ) 

zVr - 1 

xo(T — t) 
Z 

xa s/T — t 


with Z ~ N(0, 1). Thus, the Monte Carlo algorithm should just calculate 


o-r(T-t) 


1 M 

'-Yf 

M “ J 


i =l 


x exp 



(T -t) + oVT 


tZi 


Zi 

xa s/T — t 


after the simulation of M pseudo-random numbers Z\,Z 2 ,---,Zm from the 
standard Gaussian distribution. But, in general, in the standard Black and 
Scholes model, there is no need to simulate under the measure Q. We only need 
to know that this measure exists in order to know that the price calculated as in 
Equation (6.9) is fair. 


6.5.1 Pricing of path-dependent options 

For path dependent options, there rarely exist closed form formulas like the 
simple European and call options. In this case, one possibility is to make use of 
the general Equation (6.9) inside a Monte Carlo algorithm but we need to modify 
it because we need to simulate and keep the whole path of the process. 

6.5.1.1 Barrier options 

These contracts are such that their payoff is zero if, during the time 0 < t < T, 
the event S, > ft occurs, where ft is the barrier. For a European call with barrier, 
the payoff function is given by the formula 


X — max (0, (S T — ^)l{ 5 »<^,«e[ 0 ,r])) ■ 













The price formula is 
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P(0) - e~ rT E Q [max (0, (S(T) - K)l [S(t) < b , te[ o,T]})] 

and, for t — 0, S®' s ° = S,. Thus, 

Pq — e iT Eq {max(0, (Sr — ^)l{ 5 ,<^,;eio,r]j)} ■ 

We need to simulate a full path up to the time instant when S t eventually crosses 
the barrier ft. To simulate S, we need to create a grid of time /, = i At, with r I+ i — 
tj — A, — T/N with N large enough. Then, we can simulate S ti+l conditionally 
on S ti as follows: 


with Z 
rithm. 


,s, + i = st exp ^ | r — -cr 2 ) At + o'x/AiZ 


N(0, 1) and ,v, = ). Figure 6.6 gives a representation of the algo- 



Figure 6.6 Monte Carlo algorithm to price barrier options. 
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6.5.1.2 Asian options 

The Asian option has a payoff of the form: 



In practice, the integral in most cases is discretized with formulas like the 
following 

X = max ( 0, —-— ir S(ti) - K ) 

where t, = i At, i = 0,..., N, At = jj. One way is to proceed as follows: 
generate several trajectories, evaluate the average of S t over t e [0, T |, apply the 
payoff function and finally discount the average payoff. The function MCAsian 
implements this using the usual foreach approach. 

R> MCAsian <- function(S0 = 100, K = 100, t = 0, T = 1, mu = 0.1, 

+ sigma = 0.1, r = 0.1, N = 100, M = 1000) { 

+ require<foreach) 

+ h <- function(x) { 

+ require(sde) 

+ z <- colMeans (sde. sim (X0 = SO, model = "BS", theta = c(mu, 

+ sigma), M = x, N = N)) 

+ f <- function(x) max(x - K, 0) 

+ pO <- mean(sapply(z, f)) 

+ } 

+ nodes <- getDoParWorkers() 

+ p <- foreachfm = rep(M/nodes, nodes),. combine = "c") %dopar% 

+ h (m) 

+ p <- mean (p) 

+ p * exp(-r * (T - t)) 

+ } 

We can now test the function 

R> M <- 5000 
R> mu <- 0.1 
R> s <- 0.5 
R> K <- 110 
R> r <- 0.01 
R> T <- 1 
R> SO <- 100 
R> set.seed(123) 

R> pO <- MCAsian (SO = SO, K = K, t = 0, T = T, mu = mu, sigma = s, 

+ r = r, N = 250, M = M) 

R> pO 

[ 1 ] 9.835132 

An alternative approach is to use another recursive approach (Benth 2004). Let 
us define Z, = (r — 0.5er 2 )Af + aJTxtZi, with Z ; , i = 1,..., N, independent 
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draws from N(0, 1). Then 


Sj +1 = Si exp ^ | r — —a~ ) At + a\TKtZi 


Si+i exp(Z,). 


To explain the idea, let us set N — 3. 
4 E- 4' 


1 1 

T 2_^ S ‘ = TW> + 51 + s 2 + s 3 ) 


1 

= -(io + ^oexpZi +52+3) 

1 

= -(5o + 5 0 exp Zi +5iexpZ 2 +5 3 ) 

1 - . 

= -(5o + 50 exp Z\ + 5 0 exp Z\ exp Z 2 + 53 ) 

1 

= -(5o + ( 5 0 expZi(l + exp Z 2 ) + 53 ) 

1 . . - 
= -(50 + (50 exp Zi (1 + exp Z 2 ) + 5 0 exp Z\ exp Z 2 exp Z 3 ) 

5o 

= —(1 + expZj(l + exp Z 2 (l + expZ 3 ))) 

Now let us define 


Then 


T,v = I + exp Z N 

F,_i = 1 + exp(Z,_ 1 )T,-, i = TV, TV - 1,..., 2 


1 

N + 1 


E^‘) 


(=0 


^0 

A^+ 1 


Fi. 


Figure 6.7 contains the above algorithm for Monte Carlo option pricing. 


6.5.2 Asian option pricing via asymptotic expansion 

The geometric Brownian motion process can also be interpreted in the framework 
of small diffusion processes solution to the stochastic differential equations 

AX E t = a(X E t , e)dt + b(X E , e)dW t , s e (0, 1], 

The geometric Brownian motion is the particular case when a(x, e) = fix and 
b(x, e) = ex with a — e e (0, 1]. Hence, another way to estimate the price of 
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Figure 6 .7 Monte Carlo algorithm to price Asian options. 


the Asian option is to recognize that the payoff function contains the evaluation 
of the functional 



which is a particular case of the general functional 


F £ (X s t ) 



f a (X s t ,d)dW? + F(X £ ,e) 


setting W,° = t by definition and taking 

fo(x,e) = ^, f\(x, s) — 0, F(x,e) = 0. 


Asymptotic expansion via Malliavin calculus is implemented in the yuima pack¬ 
age. The next code offers an example of use via the function AEAsian 
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R> AEAsian <- function(SO = 100, K = 100, t = 0, T = 1, mu = 0.1, 

+ e = 0.1, r = 0.1, N = 1000) { 

+ require(yuima) 

+ diff.matrix <- matrix(c("x*e"), 1, 1) 

+ model <- setModel (drift = c("mu*x"), dif fusion = diff .matrix) 

+ xinit <- SO 

+ f <- list(expression(x/T), expression(0)) 

+ F <- 0 

+ yuima <- setYuima(model = model, sampling = 
setSampling(Terminal = T, 

+ n = 1000) ) 

+ yuima <- setFunctional(yuima, f - f, F = F, xinit = xinit, 

+ e = e) 

+ FO <- FO(yuima) 

+ rho <- expression(0) 

+ get_ge <- function(x, epsilon, K, FO) { 

+ tmp <- (FO - K) + (epsilon * x) 

+ tmp [ (epsilon * x) < (K - FO) ] <- 0 

+ return(tmp) 

+ } 

+ epsilon <- e 

+ g <~ function(x) { 

+ tmp <- (FO - K) + (epsilon * x) 

+ tmp[(epsilon * x) < (K - FO)] <- 0 

+ tmp 

+ } 

+ asymp <- asymptotic_term(yuima, block = 10, rho, g) 

+ exp(-r * t) * (asymp$dO+e * asymp$dl) 

+ } 

The AEAsian constructs a geometric Brownian model first, then specifies the 
functional and finally calculate the asymptotic expansion up to the first term. For 
more details we suggest to look into the yuima manual. The use is similar to 
MCAsian, but there is no simulation step and evaluation is almost instantaneous 

R> pi <- AEAsian (SO = SO, K - K, t - 0, T = T, mu = mu, e = s, 
r = r) 

YUIMA: Solution variable (lbs) not specified. Trying to use 
state variables. 

YUIMA: 'delta' (re)defined. 

YUIMA: Get variables... 

YUIMA: Done. 

YUIMA: Initializing... 

YUIMA: Done. 

YUIMA: Calculating dO... 


YUIMA: Done 
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YUIMA: Calculating dl term... 

YUIMA: Done 
R> pi 

[1] 10.26708 

6.5.3 Exotic option pricing with Rmetrics 

We conclude mentioning that the package fExoticOptions contains the imple¬ 
mentation of several exact of expansion formulas for exotic options including 
Asian and barrier options. Most formulas are taken from Haug (1997). For 
example, if we want to replicate previous example we can use the function 
Turnbu1lWakemanAsianApproxOption 

R> require(fExoticOptions) 

R> p2 <- TurnbullWakemanAsianApproxOption("c", S = SO, SA = SO, 

+ X = K, Time = T, time - T, tau = 0, r = r, b = r, 

sigma = s)©price 
R> p2 

[1] 7.96567 

Alternatively, one can use the fAsianOptions which includes several specific 
approximations for Asian options as well as functions to evaluate upper bounds 
and lower bounds for the prices of several types of Asian options. For example, 

R> require(fAsianOptions) 

R> p3 <- GemanYorAsianOption ("c", S = SO, X = K, Time = T, r = r, 

+ sigma = s, doprint = FALSE)$price 

R> p3 

[1] 7.920475 

R> p4 <- VecerAsianOption("c", S = SO, X = K, Time = T, r = r, 
sigma = s, 

+ table = NA, nint - 800, eps = le-08, dt = le-10) 

R> p4 

[1] 7.920546 

R> p5 <- ZhangAsianOption("c", S = SO, X = K, Time = T, r = r, 
sigma = s, 

+ table = NA, correction = TRUE, nint = 800, eps = le-08, 

dt = le-10) 

R> p5 

[1] 8.294416 

Table 6.1 summarizes the different results. 
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Table 6.1 Different approximations of Asian option price. 


Monte Carlo 

Asymp. Exp. 

Turnbull-Wakeman 

Geman-Yor 

Vecer 

Zhang 

9.84 

10.27 

7.97 

7.92 

7.92 

8.29 


6.6 Implied volatility and volatility smiles 

If we look at the market price of e.g., a call option at a given time instant, we 
can compare it with the price predicted by the Black and Scholes formula. Let 
us denote by p the price observed on the market. Now, consider the Black and 
Scholes price of a call option at time t — 0 


po = So<t>(di) - e~ rT K^(d 2 ) 


with 


d\ — di+ a\ff ", 


d-2 


!n f + (r - V) T 
as[T 


Given the strike price K, the time to maturity T. the interest rate r, the current 
price of the asset So and its volatility o, we are able to calculate the predicted 
price po by the above formula. We can compare this price po with the market 
price p. The only delicate matter is which value of a we should plug in the 
formula. One should think at taking the historical volatility estimated on the 
log-returns (see Section 5.1.2). In the next example we consider the data for the 
Atlantia (ATL.MI) asset for the period from 23 July 2004 to 13 May 2005. We 
download the data from the Yahoo server using the function yahooSeries from 
the package flmport 

R> require (flmport) 

R> S <- yahooSeries("ATL.MI", from = "2004-07-23", 
to = "2005-05-13") 

R> head(S) 

GMT 



ATL.MI.Open ATL.MI 

.High ATL. 

.MI.Low ATL. 

■MI.Close ATL 

.MI .Volume 

2005-05-13 

20.82 

20.87 

20.50 

20.55 

5944700 

2005-05-12 

20.88 

21.10 

20.63 

20.77 

3324700 

2005-05-11 

20.81 

21.01 

20.66 

20.86 

7415700 

2005-05-10 

20.65 

20.98 

20.61 

20.80 

2357700 

2005-05-09 

20.40 

20.66 

20.23 

20.60 

4171500 

2005-05-06 

20.30 

ATL.MI.Adj.Close 

20.68 

20.08 

20.50 

3038800 


2005-05-13 16.96 
2005-05-12 17.14 
2005-05-11 17.22 
2005-05-10 17.17 
2005-05-09 17.00 
2005-05-06 16.92 


R> Close <- S[, "ATL.MI. Close"] 
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and then look at the data with chartseries from the quantmod package. The 
result is in Figure 6.8. 


Close [2004-07-23 09:00:00/2005-05-13 09:00:00] 



Figure 6.8 Close values of the ATL.MI asset. 


R> require(quantmod) 

R> chartSeries(Close, theme = "white") 

We now calculate the variance of the log returns in order to obtain the historical 
volatility setting A = 1/252 because we use daily data 

R> X <- returns(Close) 

R> Delta <- 1/252 

R> sigma.hat <- sqrt(var(X)/Delta)[1, 1] 

R> sigma.hat 

[1] 0.1933289 

We have used the R function returns from the timeSeries package. In order to 
use the Black and Scholes formula we need to identify all quantities. We consider 
a call option priced on 13 May 2005. The market price was p = 0.0004, the strike 
price K — 23, So = 20.55. The expiry date was 3 June 2005 which corresponds 
to 15 days, thus we set T = 15 • A. The annual interest rate was r — 0.02074. 
On 13 May 2005, ATL.MI call option was priced 


R> SO <- Close[1] 

R> K <- 23 

R> T <- 15 * Delta 

R> r <- 0.02074 

R> sigma.hat <- as.numeric(sigma.hat) 

R> require(fOptions) 

R> pO <- GBSOption("c”, S = SO, X = K, Time = T, r = r, b = r, 
sigma - sigma.hat)Sprice 
R> pO 


[1] 0.003125474 
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We notice here that there is a difference in the theoretical price po and the 
market price p. Apart from the fact that the market price is influenced by many 
factors (including the fact that most of the Black and Scholes hypotheses are 
not satisfied), one can interpret this saying that the market expectation on the 
exercise of this call is very small. From another point of view, one can instead 
consider the Black and Scholes formula replacing po with p 


p = S 0 d>(di) - e~ rT K<t>(d 2 ) 


and solve it with respect to a. The value of a which satisfies the equality is 
called the implied volatility. We can use the function GBSVoiatility to solve 
this problem 

R> p <- 4e-04 

R> sigma.imp <- GBSVoiatility(p, "c", S = SO, X = K, Time = T, 

r = r, 

+ b = r) 

R> sigma.imp 

[1] 0.1557277 

As we see, the implied volatility is lower than the historical volatility. This is 
interpreted again to mean that the market expects low probability of exercising 
the contract. The historical probability and the implied volatility rarely match. 
One reason is that the Black and Scholes model assumes a fixed volatility a over 
time, while market actors know that volatility is far from being stable and try to 
predict its trend and levels. So, implied volatility incorporates the expectation of 
market actors on the options and the underlying assets. If one looks at the plot 
of the returns for this asset (Figure 6.9) one can see the volatility changing a lot 
around the beginning of 2005. To be sure about this change, we make use of the 
cpoint function in the sde package. This function will allow for the discovery of 
a structural change point in the structure of the volatility of a generic stochastic 
differential equation following De Gregorio and Iacus (2008). In Section 9.1 we 
wifi discuss this in detail in a more general approach to the problem of change 
point in the volatility. 


R> require(sde) 

R> cp <- cpoint (as.ts (X)) 

R> cp 

R> time(X)[cp$kO] 

R> plot(X) 

R> abline(v = time(X)[cp$kO], lty = 3) 

Using the second part of the series will make the theoretical Black and Scholes 
price po even more far than the current price market p. Similar evidence occurs 
for most of the standard options priced in the market. 
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$k0 

[1] 123 


$tauO 
[1] 123 

$thetal 

[1] 0.00580984 

$theta2 

[1] 0.01649384 
GMT 

[1] [2005-01-12] 


o 

o 


o 

p 

o 


" 3 - 

o 

o 
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Figure 6.9 Returns of the ATL.MI asset with change point estimation. 


6.6.1 Volatility smiles 

The same analysis on the volatility can be done on the same asset for options with 
different strike prices or expiry dates. What happens in general is that the implied 
volatility changes for given maturity T but different values of the strike price K, 
but in a nonlinear way. Plotted as a function of the strike price K, the implied 
volatility designs a curve, sometimes u-shaped and this curve is called volatility 
smile. Consider for example, the price of the call options for the asset Apple, Inc. 
(AAPL). We have collected the prices of the options for different strike prices in 
Table 6.2. They refer to the same expiry date 17 July 2009. Data were collected 
on 23 April 2009, so T is about 60 working days. The current value of the assets 
was S 0 — 123.90. For each price we calculate the implied volatility: 


Table 6.2 Call options prices p for Apple, Inc. for different strikes K. In all 
cases expiry date is 17 July 2009. 


p 

22.20 

18.40 

15.02 

11.90 

9.20 

7.00 

5.20 

3.60 

2.62 

K 

105 

110 

115 

120 

125 

130 

135 

140 

145 

p 

1.76 

1.28 

0.80 

0.53 

0.34 

0.23 

0.15 

0.09 

0.10 

K 

150 

155 

160 

165 

170 

175 

180 

185 

190 





















EUROPEAN OPTION PRICING 


277 


R> S <- yahooSeries("AAPL", from = "2009-01-02", to = "2009-04-23 ") 
R> Close <- S[, "AAPL.Close"] 

R> X <- returns(Close) 

R> Delta <- 1/252 

R> sigma.hat <- sqrt(var(X)/Delta) 

R> sigma.hat 

AAPL.Close 
AAPL.Close 0.4495024 

R> Pt <- c(22.2, 18.4, 15.02, 11.9, 9.2, 7, 5.2, 3.6, 2.62, 1.76, 

+ 1.28, 0.8, 0.53, 0.34, 0.23, 0.15, 0.09, 0.1) 

R> K <- c(105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 

+ 160, 165, 170, 175, 180, 185, 190) 

R> SO <- 123.9 
R> nP <- length(Pt) 

R> T <- 60 * Delta 
R> r <- 0.056 

R> smile <- sapply(l:nP, function(i) GBSVolatility(Pt[i], "c", 

S = SO, 

+ X = K[i], Time = T, r = r, b = r)) 

and then plot the values as a function of the strike price K. Figure 6.10 shows 
this almost u-shaped volatility smile picture. 

R> vals <- c(smile, sigma.hat) 

R> plot(K, smile, type = "1", ylim = c(min(vals, na.rm = TRUE), 

+ maxfvals, na.rm = TRUE)), main = "") 

R> ablinefv = SO, lty = 3, col = "blue") 

R> ablinefh = sigma.hat, lty = 3, col = "red") 

R> axis(2, sigma.hat, expression(hat(sigma)), col = "red") 



Figure 6.10 Example of volatility smile: implied volatility as a function of the 
strike price K for given expiry date T. The vertical dotted line is the current price 
of the underlying asset and a is the value of the historical volatility. 
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6.7 Pricing of basket options 

We end this chapter with a simplified approach to pricing of derivatives written 
on more than one asset under the multidimensional Black and Scholes model. 
Assume we have n assets and denote their prices by S\(t), S 2 (t),.. ■, S„(t). 
Assume further that m independent Brownian motions B\(t),, B m (t ) act on 
the market as source of noise. These noises are independent but the dynamics of 
asset prices may depend on one or more of them. Let E be the volatility matrix 
defined as 


Oil 

012 ■ 

&\m 

021 

022 ' 

'' cr 2m 

On 1 

0„2 ' 

’ * &nm 


The constants 07 ,- represent covariances between log-returns of asset prices and 
Brownian motions. Let further / 1 ,, i = 1 ,••■,/; be n constant. The i -th asset 
price satisfies the following stochastic differential equation 

m 

d Si(t) = & Mt) dt + Si(t)J2^jdBj(t) 

7=1 


with solution 


d Stit) = Sti 0) exp 


m 


\ ./=' ) 

j= l 


The vector process (Si(t), S 2 (t) . S n (t)) is called multidimensional 

geometric Brownian motion and will be the underlying process of options based 
on multiple assets. These kind of options are called basket options. Example of 
basket options are contracts which pay only when the difference between two 
assets is positive, i.e. 

X = f(S u S 2 ) = max(5 1 (T) - S 2 (T),0) 

or options which pay only when the maximal value of all underlying pass some 
threshold K 


X = /(Si, S 2 ) = max(max(Si(T), S 2 (T)) - K, 0) 

Generally speaking, the payoff /(■) of basket options or multidimensional 
T -contingent claims are 


X = S 2 (T),..., S n (T)) 
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When an equivalent martingale measure Q exists (we will see that this is true 
when n — m) we obtain 


P(0 =e“ r(r “' ) E e (Z|J- r ) 

where T t is the a -algebra (all the information) generated by the m Brownian 
motion up to time t. 

In order to calculate P(T) we need, as in the one-dimensional case, the 
equivalent martingale measure. One way to obtain it is to require the existence 
of the inverse of the volatility matrix E, but the inverse 1 exists only for square 
matrixes therefore one necessary, but not sufficient, condition for the existence 
of an equivalent martingale measure is that m = n. So assume m — n. Introduce 
the vector k = E -1 (/r — r 1) and define the following vector process M(t) 

M(t ) = exp j— A/B, — —XX't 

with B, = B 2 (t), ...,B n (t))', 1 = (1,1,..., 1)', /i = 0u, /r 2 ,..., fi„) 

and r is the interest rate. The martingale measure is obtained as in the 
one-dimensional case. For each A c we define 

Q(A ) = El A M(T) 


then Q is equivalent to P and 


W f — B, -|- Xt 

is a n -dimensional Brownian motion with respect to measure Q. Similarly to the 
one-dimensional case it can be proved that W f is a Brownian motion under Q 
and not under P and further that Mil) is a martingale under Q. 

Under the measure Q the dynamics of the underlying prices are as follows: 

n 

dS/(0 - rSi(t)dt + SiCOj^erijdWj(t) 
j= i 

and the discounted prices Sf(t) = e~ rt Si(t) satisfy the stochastic differential 
equations 

n 

dSf(t) = S?(t)J2°ij dw j(t). 

7=1 

It can be proved that the discounted process is a martingale, i.e. 


E Q [S?(t)\F s ] = Sf (.s) 


1 The inverse of a matrix E is denoted by E 1 and it is such that E 1 E = I where I is the 
identity matrix. 
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Further, the payoff of a T -contingent claim we need to calculate the conditional 
expected value under Q 

P(t) = e~ (r_0 = ^q[X\T,] = C(t, Si(t), S 2 (t ), ..., S„(0) 


with X = f(Si(T), S 2 (T) . S„(T)). 


6.7.1 Numerical implementation 

In order to use Monte Carlo method we need to consider translated geometric 
Brownian motions S\' x . Indeed, we can write 

C(f, x i , x 2 , ■ • •, x n ) = e - r(r - f) E q [f(S[ Xl (T), S’ 2 X2 (T ),..., S^iT))] 


with 


S‘’ x ‘ (T) — Xj exp 



1 

2 



(T - t) + J2cr ij (W j (T) - Wj(t))\ 


The hedging portfolio is then 


where 


n 

Hit) = (')$(*) + bH (t)R(t) 

i=1 


a?(t) = C(t , SKO, 5- 2 (0, • • •, S«(0) 

dX; 


6.7.2 Completeness and arbitrage 

Under the above model 

• the market does not allow for arbitrage opportunities if and only if n < m 

• the market is complete if and only if n > m 

therefore m — n is a necessary condition for complete and arbitrage free markets. 
This result is due to Harrison and Pliska (1981). See also references therein. 

6.7.3 An example with two assets 

Assume we have two assets only 


dSi(t) = fXiSi(t)dt + Si(t)(cr n dBiit) + <J\ 2 dB 2 (t)) 
dS 2 (t) — fx 2 S 2 (t )dt + S 2 (t){cr 2 idB\(t) + cr 22 dB 2 (t)) 
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with B\(t) and Ihji) two independent Brownian motions. The log-retum are 
defined as 

X(t) = In (Sr (0/Si (f - 1)) Y(t) = In (S 2 (t)/S 2 (t - 1)). 

Let us calculate the correlation between X{t) and Y (/). Denote by A5,(t) = 
Bj(t) — Bj(t — 1) ~ IV (0, 1). Replace Si and S 2 in the expressions of the 
log-retum 



Given that B\ e Bo are independent, so are A B\ and A B 2 , then X(i) and Y(t) 
are two Gaussian random variables. 




The covariance is given by 

Cov(X(0, Y(t)) = E X(t)Y(t) -EZ(f)ET(0 


= 0'll0'2lE(AB 1 (t)) 2 + or 12 <T22E(AB 2 (0) 2 


+ (^11^22 + o-i 2 o- 2 i)E(A5i(0AB 2 (0) 

= O’ltO'21 + ° r l20‘22 


Cor(Z(t), Y (?)) = 


Cov(X(t), Y(t)) 


VVarZ(?)VVarT(0 


^ 11^21 + a 21 a 22 



We now set an — an a 2 \ = a 2 , a 12 = crip and a 22 = 0, i.e. 


an cr 12 
021 < y 22 


a, pai 
a 2 0 


then 


< T |«2 


±1 


Cor (X(t), Y(t )) = 



\/l + P 2 


Hence, from the above, given the sign of 02 02 and the value of p 1 it is possible 
to model a two assets option very simply. 
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6.7.4 Numerical pricing 

Consider n assets (S\, S 2 , ■ ■ ■, S n ) and n independent Brownian motions 
(W\, W 2 ,. ■ ■, W n ). The stochastic differential equations are 

n 

d Si(t) = rSi(t)dt + Si(t)J2°ijdW(t). 

7 = 1 

The payoff is X = fit, SfT), S 2 (T), S n (T)) and 


where 


C(t, x u x 2 , .... x n ) = e- r(r -°E Q [^'(T),..., S^iT)]. 


Further 

S‘' Xi (T) = jcrexp 


1 " 1 " 

an (r - t) + J2 otj(W{T) - W(t) 


j= 1 


7=1 


The algorithm is as follows: 

• simulate N samples of size n from the N{ 0, 1) distribution 


(z\, z„) 


• calculate 


N 


,-r(T-t ) 


^ s k 2 ,...,,s k n ) 


k= 1 


where 


Sj = Xi exp 


1 » \ 

(r-t) + £ 


a U z j 


7=1 


7=1 


6.8 Solution to exercises 

Solution 6.1 (to Exercise 6.1) We start from P, = e~ r(T n K dH—ch) — 
S, <J>(—r/j) in (6.10) and remind that <J>(—z) = 1 — <l>(z). Therefore 

p, = e -r(J-t) K{l _ d> W2)) _ Sf(l _ 4, Wl)) 

= S t <t>(dl) - e- r(T - n K<P(d 2 ) + e- r{T - t] K - S, 

= C t - St + e-^-^K. 


which is (6.11). 
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Solution 6.2 (to Exercise 6.2) First we notice that Mj is positive, and this 
implies that Q{A) — E ( IaMj ) > 0 for all A € A Further 

<2(£2) = E (IqMt-) = EM t = e~^ T E (e~^ BT ) = e~^ T e^ T = 1 

because e~ XBj is the log-Normal distribution. Now, let us consider two disjoint 
sets A and B in A. Clearly, IaubOA = 1a(&>) + 1 b(<w). Then 

Q{A U B) = E (1aubM t ) = E (1 A M T ) + E (1 B M T ) = Q(A) + Q(B). 

Finally, let Aj, i = 1,2,... such that A/ fl Aj — 0 for every i j. 

( oo \ / oo \ oo oo 

U A/) = E (l u » 1 A,. Mt) =E 1a. M t = E E (1a, M t ) = £ <2(A,) 

i =1 / \i=l / i=l 1=1 

we admit that we can exchange integration and summation. This has to be 

verified, but we don’t do it here. 

Solution 6.3 (to Exercise 6.3) Clearly. X<) = 1 and it is T t measurable. Assume 
that X t is a martingale. Then, X t being a martingale is such that its expected 
value is constant. Hence 

EX, = Eexp {pt + oB,} = e^Ee aB > = e^e^ 1 = e^+h^f 

Then, EX, is constant and independent of t only if p — —Icr 2 . Now, let 
fi — —jcr . Let us verify that X, is a martingale. 

E|X,| = EX, = e'^+Wl = e ° = 1 < oo. 

Further, 

E{X,|JF s } = E {e^ +<rB ' | J 7 ,} — e^E {exp(<r£,.) + exp(cr(B, - 5 V ))| A)} 

= e^e aSl E{exp (a(B, - B s ))} = e^'e aBs e^ 2 ^ 

— e -\o 2 t e oB se \a 2 (t-s) _ e us+<rB s _ ^ 


6.9 Bibliographical notes 

The standard reference for option pricing is the famous book of Hull (2000). 
This book contains all the basic theory and a lot of insights about option pricing 
in practice. Advanced books for mathematically educated readers are Shreve 
(2004a,b) and Musiela and Rutkowski (2005). An intermediate approach is 
contained in Benth (2004), Mikosch (1998), Wilmott el al. (1995) and Ross 
(2003). The book of Haug (1997) contains an extensive list of exact formulas for 
pricing including non-European options. The above list of references is largely 
incomplete, but those books are easily found in any library of practitioners 
and researcher. 
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7 


American options 


7.1 Finite difference methods 

American options are similar to European options with the peculiarity that they 
can be exercised during the whole time interval [0, T], 

Assume that we own an American call contract with a strike price of 40 €, 
expiry date one month and the current price of the underlying asset is 50 €. 

If this is the case, it will appear very profitable to exercise the option immedi¬ 
ately and gain 10 €. But, if we also own a portfolio which includes the underlying 
asset of this option and we want to keep it for at least one month, then exercising 
now is not the best strategy. So we want to wait a little more. 

Another good reason to wait is that there is still some probability (even if 
very small) that the price of the underlying asset decreases below 40 €; in such 
a case the American call option plays the role of a warranty against the decrease 
in value of the assets in our portfolio. 

On the contrary, if we think that the asset in our portfolio is overvalued by 
the market it becomes interesting to understand when it is more convenient to 
exercise our option or sell the underlying asset. 

Notice that the value of an American call is always higher than the value 
of the corresponding European option, while this is untrue for put options or in 
general in the case of dividends. 

Consider now an example of an American put option: assume that the exercise 
price is fixed to 10 € and today’s value of the underlying asset is almost zero. 
If we exercise immediately we have a payoff of about 10 €. If we keep waiting, 
the payoff can decrease below 10 €. In this case, the American put option should 
be exercised immediately. In general, the value of an American put is a function 
of the initial price of the asset So : the lower, the higher the value of the option. 


Option Pricing and Estimation of Financial Models with R, First Edition. Stefano M. Iacus. 
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American options which can be exercised only at prescribed dates before the 
expiry date are called Bermudan. 

The price of an American put must satisfy the following Black and Scholes 
inequality: 



(7-1) 


under additional and technical regularity conditions which ensures the continuity 
of the quantities involved. This function c(t, x) should be evaluated at each time 
instant and each value of the underlying assets x = S t to decide whether is better 
to keep the option live or exercise. It is in general an optimal stopping time 
problem. An American put if the current value of the underlying asset is below 
the exercise frontier, say f t , i.e. if S t < ft- 

Exact formulas for this problem do not exist even in the simple case but sev¬ 
eral approximation techniques have been proposed. The first class of techniques 
is called finite difference method which consists in the search of the solution 
of the above Black and Scholes inequality (7.1) using numerical arguments. We 
will present two standard methods in the following (see also Hull 2000). 

7.2 Explicit finite-difference method 

Suppose we have a put option with final maturity date T. The idea is to partition 
the time interval [0, T] into N intervals of the same length, say At = T/N. So we 
have N + 1 time instants f, = i At, i = 0, 1,..., N. We also need to discretize 
the potential support of the process S t for t e [0, T |. We denote by |.S mm . .S' max ) 
this support. Those two values are chosen in a substantive way and case by 
case for each option. Then, we divide this interval into M subintervals of the 
same length and denote by AS = (S'max] — Smm ])/Af the size of the increments 

of the price process S,. We denote by xj = ,S mm + j AS, j — 0. I. M, the 

M + 1 points of the grid. So we now have a grid of (TV + 1) x (Af + 1) points 
in which we evaluate numerically the solution C(t, x) of (7.1). We assume that 
the strike price K is in between S max and S mm , otherwise the problem of pricing 
of the option is not particularly interesting. To shorten the notation we define the 
following quantities: 


17 / . j — C ( ti. X j ) 

= C(i At, Smin + j AS),... i = 0, 1,..., N, j = 0, 1,..., M. 


Figure 7.1 gives a representation of the grid in which each point of the grid C,-j 
corresponds to the above quantity. 

We now use different approximations of the partial derivatives of the function 
C(t,x). We use the centered derivative 



Ci.j+i Cj j—\ 


2AS 


(7-2) 
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cd 
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CO 
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CO 

< 

CO 

<1 


CiiO 



03 I I I I I I ! I I I I 

0 iAt (i + 1 )At T 

Figure 7.1 Grid of (N + 1) x ( M + 1) point in which the function is being 
evaluated numerically. 


and, in order to construct the second partial derivative with respect to x 2 , we 
introduce these two approximations 


9 Cj ;_|_1 — Cj i 

— C(t, x) — — - -— forward approximation 

dx AS 


(7-3) 


and 


— C(t,x) 
dx 


Cj.j Cjj-i 

AS 


backward approximation 


(7.4) 


Finally, mixing both (7.3) and (7.4) we obtain 


d 2 C(t,x ) _ ^ 

dx 2 AS 


Cij+ i + Cjj_ i — 2 Cjj 
AS 2 


(7-5) 


while for the time derivative we use the forward approximation 


9 C i+ i : — Cj ; 

— C(t,x) = 
dt At 


(7.6) 


Now, by replacing (7.2), (7.5) and (7.6) in the Black and Scholes equation 
we get 


rjQ j = C ' + Ki CiJ + rjAS CiJ+1 
J ’ J At J 2AS 

+ -o 2 j 2 (AS) 2CiJ+l+CiJ - l ~ 1CiJ 


2AS 
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which can be rewritten as 

Qj = a*C i+ ij-i + b*C i+lJ + c*C i+w 


(7.7) 


with 


1 /I It, 

b i = T+7a7 


(l-a 2 j 2 At) 

1/1 1 


7 1 + r At V 2 


rjAt + -cr 2 j 2 At 


which provides an iterative approximation formula for C(t,x), the price of the 
option. The first thing to remark is that the present value of Qj depends on the 
future values of C,-+ \j and C,-+ij + i and our interest is in C(0, So) 

in order to first price the American put option. Therefore, in order to get the 
solution we need to start from the rightmost part of the grid. Figure 7.2 shows 
this fact and also put in evidence that, luckily enough, we know the values of 
the edges of this grid. Indeed, at time T, we know that the payoff is given by 
maxik" — St, 0) like the usual put option. Hence 

C NJ =max(K -jAS, 0), j = 0, 1,..., M. (7.8) 

This gives us the rightmost dots (•) in Figure 7.2. We now look at the upper 
edge of the figure. This corresponds to the case when S t — .S’ max at each given 



of 


~i i r 

/Af(/+1)Af 


□ 

n 

T 


Figure 7.2 Explicit finite difference method. The value C(t,x ) depends on future 
values of the same function. Values on the edges are known by construction, so 
the algorithm proceeds from right to left iteratively to determine C(0, So). 
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time t. Therefore, given that ,S max > K. the payoff is always zero. Then 

Ci, M = 0, i=0,l,...,N, (7.9) 

which correspond to the upper diamonds (♦) in Figure 7.2. Finally, if S, — .S mm , 
the corresponding payoff at each given time t is the maximal payoff, i.e. 

C i , 0 = K-S min , i =0,1,...,, N, (7.10) 

i.e. the squares (■) at the bottom of Figure 7.2. We are now able to write the 
equation for the i = N — 1 

Cjv-t ,j — ®*Cnj -i + b*CN,j + c*Cnj+\, j = 1, 2,..., M — 1 


with 


Cn— 1,0 — K 5rnin Cn—\,M — 0 - 

The above is a system of M — 1 equations with M\ unknowns C,y_ u, 
Cat-i, 2 , ..., Cat— i. m-\ but, each equation has an explicit solution, so there is no 
need to invert the matrix of the system. Once these values are obtained, we need 
to compare the value of C,y- i. ; with the corresponding value of European put 
in order to satisfy the Black and Scholes inequality (7.1). If this value Cn-ij is 
smaller than the corresponding value Cnj it is convenient to exercise. Therefore 
N — 1 will be the optimal time. In practice this means that, if at time t — (N — 1) 
At the underlying value of the asset is S, = xj — .S mm + j AS and Cat-ij < 
Cnj, the value of the American put is lower than the value of the value of 
corresponding European put (i.e. the present payoff), so it is no longer convenient 
to keep the contract alive. We need to repeat the analysis for each value of j 
in order to determine which is the value of j which identifies the frontier of 
exercise. Once this is done, the last two columns are completely determined and 
we can proceed with the i = N — 2 back to i =0. Once all the points of the grid 
have been determined, the first column, Co.i, Co, 2 , ..., Co.m-i> corresponds to 
the price of the option at time t — 0 A (or i = 0 ) we were looking for. 

Example 7.2.1 Suppose we have an American put with the following parameters: 
strike price K — 30, time to maturity T — 1, volatility 0 — 0.4, interest rate r — 
0.05 and current price of the underlying asset Sq — 36. Let us set Smin = 0 and 
,S' m ax = 60. We further choose N — 10 so that At — 1/10 = 0.1 and M — 10, 
with AS = 60/10 = 6. We now fix all needed quantities. The rightmost column of 
Figure 7.2 is, top to bottom, as follows: 


max(30 - j AS, 0) = 0, 0, 0, 0, 0, 0, 6 , 12, 18, 24, 30. 


The topmost row is always zero and the bottom line is constantly equal to K — 
S r „i n = 30 — 0 = 30. Now the explicit difference method is ready to start. We do 
not do it manually for all entries of the grid but instead only calculate one single 
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point. Let use calculate C(N — 1, j = 3) using the recursive formula (7.7). We 
need the three constants a*, b* and c* 


— Ar3At + \cr 2 3 2 At 




I + r At 


~ 0.0642, ~ 0.8517, 


0.0791 


and then 


Cat- 1 ,3 = O^Cn, 2 + bjCN,3 + £3 Cat,4 

= 0.0642 • 18 + 0.8517- 12 + 0.0791 ■ 6 ~ 11.85 


The next algorithm is a simple implementation of the finite difference method 
which we will use later to calculate the grid of Example 7.2.1 

R> AmericanPutExp <- function(Smin = 0, Smax, T = 1, 

N = 10, M = 10, 

+ K, r = 0.05, sigma -0.01) { 

+ Dt - T/N 

+ DS = (Smax - Smin)/M 

+ t <- seg(0, T, by = Dt) 

+ S <- seq(Smin, Smax, by = DS) 

+ A <- function(j) (-0.5 * r * j * Dt + 0.5 * sigma^2 * j*2 * 

+ Dt)/(1 + r * Dt) 

+ B <- function (j) (1 - sigma / '2 * j*2 * Dt) / (1 + r * Dt) 

+ C <- function(j) (0.5 * r * j * Dt + 0.5 * sigma/'2 * j*2 * 

+ Dt)/(1 + r * Dt) 

+ P <- matrix(, M + 1, N + 1) 

+ colnames(P) <- round(t, 2) 

+ rownames(P) <- round(rev(S), 2) 

+ P[M + 1, ] <- K 

+ P[l, ] <- 0 

+ P[, N + 1] <- sapply(rev(S), function(x) max(K - x, 0)) 

+ optTime <- matrix(FALSE, M + 1, N + 1) 

+ optTime[M + 1, ] <- TRUE 

+ optTime[which(P[, N + 1] > 0), N + 1] <- TRUE 

+ for (i in (N - 1) : 0) { 

+ for (j in 1: (M - 1)) { 

+ J <- M + 1 - j 

+ I <- i + 1 

+ P[J, I] <- A (j) * P[J + 1, I + 1] + B(j) * P[J, I + 

+ 1] + C(j) * P[J - 1, I + 1] 

+ if (P[J, I] < P[J, N + 1]) 

+ optTime[J, I] <- TRUE 

+ } 

+ } 

+ colnames(optTime) <- colnames(P) 

+ rownames(optTime) <- rownames(P) 

+ ans <- list(P = P, t = t, S = S, optTime - optTime, N = N, 

+ M = M) 

+ class (ans) <- "AmericanPut" 

+ return(invisible(ans)) 

+ } 
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The function AmericanPutExp is a simple implementation of the Unite different 
method, so it is by no means optimal and, given its iterative nature, it should be 
coded in c, but for reasons we will explain shortly, it is not worth doing that. 
The only thing to note is that the function also calculates the frontier of optimal 
exercise and stores this information in the object before exiting. Before using the 
function AmericanPutExp we also define a plot method for the object of class 
AmericanPut that this function creates. 


type = "n", axes = F, 


R> plot.AmericanPut <- function(obj) { 

+ plot(range(obj$t>, range(obj$S), 

xlab = "t", ylab = "S") 

+ axisfl, obj$t, obj$t) 

+ axis(2, obj$S, obj$S) 

+ ablinefv = obj$t, h = obj$S, col = "darkgray" 
lty = "dotted") 

+ for (i in 0:obj$N) { 

+ for (j in 0: obj$M) { 

+ J <- obj$M + 1 - j 

+ I <- i + 1 

+ cl <- "grey" 

+ if (obj$optTime[J, I]) 

+ cl <- "black" 

+ text<obj$t[i + 1], obj$S[j + 1], 

+ 2), cex = 0.75, col = cl) 


round (obj$P [J, I], 


} 


} 

DS <- mean(obj$S[1:2]) 

y <- as.numeric(apply(obj$optTime, 2, function(x) 
which (x) [1] ) ) 

lines<obj$t, obj$S[obj$M + 2 - y] + DS, lty = 2) 


We are now ready to replicate Example 7.2.1 with R . 


R> put <- AmericanPutExp(Smax = 60, sigma =0.4, K = 30) 
R> round(put$P, 2) 
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Figure 7.3 Explicit finite difference method grid for the American put option of 
Example 7.2.1. 

and eventually plot the whole grid as in Figure 7.3 with a simple plot 
command. 

R> plot (put) 

Now, looking at Figure 7.3 and given that So = 36, the price of the option is 
2.15. We should keep the option until the price S t crosses the frontier marked as 
a dashed line in the picture. 

7.2.1 Numerical stability 

Now, let us return to the expression of the solution the finite difference method 
of Equation 7.7. We have seen that the present price C,-j is the average of 
future prices when the underlying stock decreases its values C;+ij_i, remains 
unchanged C,+ij, or increases its values C,-+ij+i. The coefficients, up to the 
scaling factor 1/(1 + rAt) > 0, are such that 

(1 + r At) x (a* + b*+c*)= 1 

so, if they are also positive, they can be interpreted as the probability of decrease, 
no change, increase of the value of the underlying asset. The problem is that 
in some cases this is not true. In particular, if j < r/a 2 , then a* < 0 and if 
j 2 a 2 At > 1 then b* < 0. This fact also causes numerical instability because, for 
some iteration, the value of the average C,-j based on those negative coefficients 
may be negative as well. This negative values then propagates backward in the 
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algorithm leading to unrealistic (actually wrong) negative prices of the option. 
For example, if we change M without increasing N in our example we get 
this result 


R> put.bad <- AmericanPutExp (Smax = 60, sigma = 0.4, K = 30, M = 15) 
R> round (put .bad$P, 2) 
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7.3 Implicit finite-difference method 

The implicit finite difference method has been designed to overcome the prob¬ 
lem of numerical instability of the previous one. Finite difference method is also 
used in European option pricing (Brennan and Schwartz 1978). The idea is to 
approximate the future value C,-+ij with present values C,-j_i, C,, ; and 
Although it appears to be a more natural approach then the previous one, the solu¬ 
tion of this method is more involved. The first step consists in the approximation 
of the partial derivates as follows: 


— C(t, x) — Ci 

dt At 

9 Ci ;+1 — Cj j -1 

— C(t, x ) = -EL_L 

dx 2AS 

d r( x _ C/,7+1 + Cjj-i — 2Cjj 
dx 2 ( AS 2 


and inserting these derivates in the Black and Scholes equation we get this new 
approximated solution for C,j : 


C( 4-1. / — cijCjj— i b j Cjj -f- Cj Ci j-\. i 
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with 

1 1 2 2 

a j = ^ r J At ~ 2 a j~ At 

bj — 1 + cr 2 j 2 At + r At 

1 1 2 , 

Cj = --rjAt- -cr-j-At 

and the same conditions on the edges of the (N + 1) x (M + 1) grid as in (7.8), 
(7.9), and (7.10). In this case the solution is not explicit. Indeed, let i = N — 1, 
then 

Cnj — fl/Cjv-ij-i + bjCN-ij + cjCn-ij + i j = 1,..., M - 1, 
which is the following system of M — 1 unknowns Cn-ij, j — 1,..., M — 1, 

aiCN-i,o + fciCjv-1,1 + ciCiv- 1,2 = Cjv.i 
CllCN— 1.1 + b>2GN— 1,2 + C2Cat_i,3 = C^,2 

aM-\CN-\,M-2 + &M-1 Ci\r- 1 , M-l + 

where the known quantities are the terms Cnj, aj, bj e Cj. To solve this system 


we write it in matrix form: 











Ax = 

b 



with 

" b\ 
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0 


0 - 


ai 
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b 2 

C3 
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A = 
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0 
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bM-2 

cm-2 
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bM-\ - 


the vector of known terms b is 

b — — aiC;v_i,o, Cat- 2 , ..., Cn,m-i — cm-iCn-i,m ) 

and the unknowns v are 

x’ = (Cjv_i,i, Cjv-1,2. • • • > Cat-i,m-i) ■ 

At this point the system can be easily solved within R with solve ( a , b ) in order 
to obtain the values of Cn-ij, j — 1, ■.., M — 1. At this point, we need to 
compare the value of Cat_ij with the payoff K — j AS. If Cn-ij < K — j AS 
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we set Cn-ij = K — j AS and the time i = N — 1 corresponds to the optimal 
exercise time. Once the columns of the values CV-ij, j — 1,.... M — 1, are 
available we can proceed with the calculation of C/v_2j by updating the above 
system. And so forth, back to the columns of Cqj, j = I..... /W — 1. Although 
this method is computationally more intensive than the explicit method, it con¬ 
verges if both At and AS are sufficiently small. The next code implements the 
implicit difference method in R . 


R> AmericanPutlmp <- function(Smin = 0, Smax, T = 1, N = 10, 

M = 10, 

+ K, r - 0.05, sigma =0.01) { 

+ Dt = T/N 

+ DS = (Smax - Smin)/M 

+ t <- seq(0, T, by = Dt) 

+ S <- seg(Smin, Smax, by = DS) 

+ A <- function(j) 0.5 * r * j * Dt - 0.5 * sigma*2 * j*2 * 

+ Dt 

+ B <- function(j) 1 + sigma''2 * j*2 * Dt + r * Dt 

+ C <- function(j) -0.5 * r * j * Dt - 0.5 * sigma*2 * j*2 * 

+ Dt 

+ a <- sapply(0:M, A) 

+ b <- sapply(0:M, B) 

+ c <- sapply(0:M, C) 

+ P <- matrix(, M + 1, N + 1) 

+ colnames(P) <- round(t, 2) 

+ rownames(P) <- round(rev(S) , 2) 

+ P[M + 1, ] <- K 

+ P[l, ] <- 0 

+ P[, N + 1] <- sapply(rev(S), function(x) max(K - x, 0)) 

+ AA <- matrix(0, M - 1, M - 1) 

+ for (j in 1: (M - 1)) { 

+ if (j > 1) 

+ AA [j, j - 1] <- A (j) 

+ if (j < M) 

+ AA [j, j] <- B (j) 

+ if (j < M - 1) 

+ AA [j, j + 1] <- C (j) 

+ } 

+ optTime <- matrix(FALSE, M + 1, N + 1) 

+ for (i in (N - 1):0) { 

+ I <- i + 1 

+ bb <- P[M: 2, I + 1] 

+ bb[l] <- bb[l] - A(l) * P[M + 1 - 0, I + 1] 

+ bb[M - 1] <- bb[M - 1] - C(M - 1) * P[M + 1 - M, I + 

+ 1] 

+ P[M:2, I] <- solve(AA, bb) 

+ idx <- which(P[, I] < P[, N + 1]) 

+ P[idx, I] <- P[idx, N + 1] 

+ optTime[idx, I] <- TRUE 

+ } 

+ optTime[M + 1, ] <- TRUE 

+ optTime[which(P[, N + 1] > 0), N + 1] <- TRUE 
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+ colnames(optTime) <- colnames(P) 

+ rownames(optTime) <- rownames(P) 

+ ans <- list(P = P, t = t, S = S, optTime = optTime, N = N, 

+ M = M) 

+ class (ans) <- "AmericanPut" 

+ return(invisible(ans)) 

+ } 

We don’t need to change the plot method because the resulting structure of the 
output of AmericanPut imp is the same as AmericanPutExp. The only difference 
between the two is that in the implicit method the value of the option under the 
exercise frontier is replaced by the payoff according to the rule explained in the 
above. We now again replicate Example 7.2.1. 

R> put <- AmericanPutlmp(Smax = 60, sigma =0.4, K = 30) 

R> round(put$P, 2) 
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Figure 7.4 Implicit finite difference method grid for the American put option of 
Example 7.2.1. 
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and again plot the grid with 

R> plot (put) 

in Figure 7.4 to see the difference with the explicit difference method. 


7.4 The quadratic approximation 

The idea due to McMillan (1986) and Barone-Adesi and Whaley (1987) was to 
consider an approximation of the value of an option with a constant dividend q 
and a constant rate r. We did not consider dividends previously, but it does not 
make a substantial difference to the derivation of the partial differential equation 
of the price of the option. For simplicity we explain the method for the American 
call, and then hnally give the formula also for the American put. When we have 
constant dividends, the Black and Scholes equation (6.1) is simply transformed 
into the following one 

9 1 3 2 3 

— C(t,x) — -a 2 x 2 —-C(I, x) + (r — q)x — C{t,x) — rC(t, x) — 0. (7.11) 
d1 2 dx z dx 

We can introduce the so-called early exercise premium defined as 

e(t , x) — C a {t, x) — C(f, x) 


where C a (t, x) is the price of the American call and C(t , x) is the price of the 
European call. Both C a (t,x) and C(t,x) solve (7.11) because of non-arbitrage 
(the difference between the two is in the boundary conditions) so also e(t,x) 
solves the same partial differential equation. Let k\ — ^4, k 2 = , r = T, 

and h(t) = 1 — e~ rx . Then rewrite e(t, x) as follows: 

e(x, t) = h(j)q(h , x) 


then Equation (7.11) can be transformed into 


x~—^ri + 


dx 2 


d d 

— q - Ml - I^T-rd ~ 
dx an 



0 . 


Now, notice that, as r —>• 0, then — 0 but also when r = I the term 
(1 — h) = 0. Thus the term 1 — h can be dropped from the partial differential 
equation. Hence, we are left with the simple ordinary differential equation 


d 2 


d 


x —^q + & 2 X — q — q — 0. 


dx 2 


dx 


h 


We now set q — bx Y , thus (7.12) becomes a simple quadratic equation 


bx Y ( y 2 + (k 2 - 1 )y 


h 

h 


= 0 


(7.12) 
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with solutions 

— (k 2 — 1) — J(k 2 — l) 2 — 4y 

n =- 2 -< °* 

-( k 2 -D + J(k 2 -l) 2 -4 k f 

Vi = - > 0. 

2 

We need to exclude y\ < 0 because for the call option we necessarily have 
> 0, i.e. the early exercise premium increases with S. So our solution is 
r / — bx Y1 , hence 


e(t, x) — h(r)r](h, x) — (1 — e rr )bx n 


or, better, 


C a (t, x) = C(t, x) + (1 — e rx )bx vi . 


Now, if S* is the value of S, above the strike price K such that it is possible to 
exercise the American call option, we can write 

S* — K — C(t, S*) + (1 - e~ rT )b(S*) y2 . (7.13) 


Further, if we assume the continuity of the hedge ratio S for the option at point 
S*, i.e. we derive the above Equation (7.13) with respect to x and evaluate it at 
x — S*, we also obtain 


1 = g-9 T 4>(d*) + (1 - e- rt )by 2 (S*) n ~ l (7.14) 

where d* corresponds to d\ for the European call option when x — S*. We can 
now explicit b from Equation (7.14) as a function of .S'* 

1 - e-^<b(d*) 

= (1 - e- rT )K2(S*) w_1 
and replace it in (7.13) which gives 


\ - e-* T <b{£) , 

K = C(t, S*) + -—5 1 

Y 2 


which can be solved numerically. Hence, finally we have that 
I C(t ,?.D —- -xm s* l 4L v n 

C a (t , S t ) = 


\c(t,S t )+ 1 e 9 ^ {dV s* 


m' 


S, - K, 


S, < S* 
S t > S*. 


With the same steps, one arrives at the equation: 


P a (t,x) = P a (t,x)+ (1 - e~ rT )cx Yl 
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with P a (t, x ) being the price of the American put and P(t, x) the price of the 
European put. Therefore, as before, we can deline S** the price of the underlying 
stock such that the American put is exercised and write 

K -S** = P(t, S**) + (1 - e~ rT )c(S**) n . 


Again, imposing the continuity of the delta, we obtain a second equation: 

-1 = - e -« r O(-^) + (1 - e- rr )cy l (S**) n ~ 1 

with d** is d\ for the European put with x — S**. Now, we can get an explicit 
formula for c in terms of S** and write 

K - S** = P(t, S**) --- l —S** 

V l 

and solve it numerically for S**. Finally, the price of the American put option is 
given as follows: 


P a (f,S t ) 


\K-S t , 
|/ , (t, S t ) - 





S, < S**, 

s, > s**. 


Barone-Adesi and Whaley (1987) propose an efficient nonlinear algorithm to 
solve the above problem. We do not treat this algorithm here but rather propose 
the use of fOption package to solve this problem. In particular, the function 
BAWAmericanApproxOption implements this functionality. We consider the same 
setup of Example 7.2.1 and solve it using the quadratic approximation 


R> require(fOptions) 

R> T <- 1 
R> sigma = 0.4 
R> r = 0.05 
R> SO <- 36 
R> K <- 30 

R> BAWAmericanApproxOption("p", S = SO, X = K, Time = T, r = r, 
+ b = r, sigma = sigma)Sprice 


[ 1 ] 2.293530 


and we can compare with the solution given by the implicit method 

R> put <- AmericanPutlmp(0, 100, T = T, K = K, r = r, 
sigma = sigma, 

+ M = 100, N = 100) 

R> put$P["36", 1] 

[ 1 ] 2.264198 


A quick remark about this method is that the above method is a bit unstable for 
very long maturities. Ju and Zhong (1999) provide an efficient modification of 
the quadratic approximation which is also stable. 
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7.5 Geske and Johnson and other approximations 

Geske and Johnson (1984) proposed a method which consists in the 
approximation of the true American option with a sequence of Bermudan 
options (i.e. options which can be exercised only at given fixed dates). Let P\ 
be the price of a European put option which can be exercised at time T ; P 2 
the price a Bermudan option which can be exercised only at times T /2 and T 
and let P 3 be the price of a Bermudan option which can be exercised at times 
r/3, 2773 and T. The so-called Richardson’s extrapolation method allows us 
to calculate the approximate value of an American option in this way: 

P a = 7> 3 + 1 -(P 3 ~ Pi) - 1 -{P 2 ~ Pi). 

The problem with this approximation is that the calculation of P 2 and P 3 requires 
evaluation of bivariate and trivariate Gaussian integrals. It was also noticed by 
some authors that the approximation may not be uniform, so a modified version of 
this approach was later proposed by Bunch and H.E. (1992). This modification 
involves only two time instants but those have be chosen in an optimal way. 
Formally, the approximation formula is the following 

P a = Pi max + (P2 max ~ Pi) 

where P 2 max is a Bermudan option where the two time instants are chosen in a 
way to maximize the value of the option. 

There are a number of other approximation methods which are not discussed 
in this book. Some of these approximations admit explicit formulas but they 
are quite long and the derivation is not that interesting. The package fOp- 
tions implements the method of Bjerksund and Stensland (1993) via the func¬ 
tion BSAmericanApproxOption and the method known as Roll-Geske-Whaley 
(R. 1979, 1977; Whaley 1981) via the function RoiiGeskewhaleyOption. The 
reader is invited to check the corresponding code. 

7.6 Monte Carlo methods 

All the above mentioned methods work quite well for one-dimensional options 
but closed or approximated formulas rarely exist in the case of American options 
written on more than one asset. As we have seen in Chapter 4. one of the 
advantages of the Monte Carlo method is that the computational complexity 
does not increase (in general) with the dimensionality of the problem. Simulation 
methods for pricing American options tend to exploit this particular feature of 
the Monte Carlo method. 

7.6.1 Broadie and Glasserman simulation method 

In their famous paper Broadie and Glasserman (1997) claim that it is not possible 
to obtain an unbiased estimator of the value of an American option via simulation. 


AMERICAN OPTIONS 


301 


The main reason is that all simulation schemes are by their nature discrete and 
hence, depending on the conhguration of each simulated path, there is still a 
non-negative probability to reverse the decision (exercise or not) at each step 
(stopping time) of the simulation. As usual, r is the risk-free interest rate, T the 
maturity, K the strike price St and So respectively the terminal and initial values 
of the stock price. We remember that the objective here is to estimate the value 

C = maxE {e~ rT max(S r — K, 0)} 

over all possible sets of stopping times r < T. In practice this is only a finite set 
of times on a grid as for the other numerical methods seen previously. Therefore, 
given a grid of times 0 = ?o < t\ < ■ ■ ■ < td = T, the idea is to simulate Si = S fl , 
S 2 = S( 2 , ..., St — S, d starting from a given So and then estimate the price of 
the option using the discretized version of the above formula 

C = max E [e~ rt> max(5) — K, 0)} . 

i=0,...,d 1 ’ 

This simulation method generates a tree where at each step b different new 
branches are created. Thus, for example, starting from So, b new values of Si are 
simulated, say Sj, S\, ..., Sj\ for the possible future prices of the stock at time 
t\. Then, for each of the S\ values at time t \, b new branches are generated at time 
f 2 . Therefore, starting from Sj 1 at time t\ we generate b new future values .Sj 1 , 
S\\ ..., S 2 *. And so forth for the other b — 1 nodes at time l \. For simplicity 
we consider b — 3 as in the original paper of Broadie and Glasserman (1997). 
Figure 7.5 represents a simulated three. It is important to notice that the tree 
is ordered only horizontally with time and not vertically with the asset price as 
in the numerical methods. This means that (see Figure 7.5) there is no ordering 
between S^ 1 , ,S ’) 2 and .S'.) 3 but they all depend on .Sj 1 . Similarly, Sj , .S') and .S) are 
not ordered. Once such a simulated tree is available, it is possible to introduce 



sV 

St 

sV 3 

sf 1 

sf 

c 23 

b T 

C 31 

Sf 

s 33 


Figure 7.5 Example of simulated path with b — 3 branches per node. 
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74(0) 

88 ( 0 ) 

102 ( 2 ) 

38(0) 

47(0) 

65(0) 

88 ( 0 ) 

149(49) 

116 ( 16 ) 
t = T 


Figure 7.6 Example of simulated path from Broadie and Glasserman (1997). In 
parentheses the payoff at the node and in squared brackets the value of the upper 
estimator 0 of the American call option. 


two estimators of the option prices, both biased (one from above 0 and one from 
below 6 ) but asymptotically (with the number of Monte Carlo simulations and 
the number of branches b in the tree) unbiased. This leads to the construction 
of confidence intervals for the real price of the American option. We start the 
description of the upper estimator 0. Consider the bottom node in Figure 7.6 
at time l\. For this node the asset price is S = 115 and the payoff of the call 
with a strike price of K = 100 is 15 (in parentheses in the plot). At time T, the 
last three nodes at time t = T have payoff 0, 49 and 16. The expected payoff is 
then 21.7 = (0 + 49 + 16)/3. We compare this value with current payoff which 
is 15. Then the upper estimator is the maximum between 15 and 21.7. We put 
this number in squared brackets. For the middle node at time t\ both the expected 
and current payoff are null, thus the present value of the option is zero as well. 
For the upper node we need to compare the present payoff, which is 14, with the 
expected payoff (0 + 0 + 2)/3 = 0.7, therefore we put 14 in squared brackets. 
Then, the final estimate is obtained by comparing the payoff at time 0, which is 
1, and the expected payoff at time t — 1, i.e. (14 + 0 + 21.7)/3 = 1 1.9 = 0. 

For the lower estimator the procedure is more involved. Consider again the 
same bottom node at time t = 1. The expected payoff at time t = T is calculated 
using branch 2 and 3, i.e. (49 + 16)/2 = 32.5. This value is compared with the 
current payoff which is 15. In this case it is worth to continue, and the continua¬ 
tion value is 0, the payoff of branch 1. The same procedure is obtained considering 
only branch 1 and 3 to calculate the expected payoff, i.e. (0 + 16)/2 = 8. In this 
case 8 < 15 then present value of option is 15. Finally, the expected payoff 
using branches 1 and 2 is (0 + 49)/2 = 24.5 which is bigger than 15. So the 
continuation value is 16, the payoff of branch 3. Now we have three different 
values: 0, 15 and 16. The estimated value for this node is the mean of these 
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values, i.e. (0+ 15 + 16)/3 = 10.3. The procedure is iterated on the remaining 
nodes back to the initial node. The final lower estimate 6 = 8.1. Thus, for this 
option we have an upper bound © = 11.9 and a lower bound 6 — 8.1. These are 
upper and lower estimates of the option in one single Monte Carlo replication. 
In order to get the convergence, we need to iterate for this procedure a sufficient 
number of Monte Carlo replications. We now present a script which implements 
this procedure on a single trajectory. The reader can easily embed this code into 
a Monte Carlo procedure. This code is designed to be sufficiently flexible, it 
accepts any number of periods d and any number of branches b. In order to save 
memory, the tree is stored on a vector where the initial node occupies position 
0 , the first b nodes occupy position I to b\ the first b nodes exiting from node 
1 occupy position b + 1 to b + 2 b\ the b subnodes of node 2 occupy positions 
2b + 1 to 2 b + 2b and so forth. Though flexible, this code has been written only 
for didactic reasons and does not include dividends and discounting factor. The 
next code defines a simulator for the tree. 

R> simTree <- function (b, d, SO, sigma, T, r) { 

+ tot <- sum(b''(l:(d - 1) ) ) 

+ S <- numeric(tot + 1) 

+ S[l] <- SO 

+ dt <- T/d 

+ for (i in 0:(tot - b*(d - 1))) { 

+ for (j in 1 :b) { 

+ S[i * b + j + 1] <- S[i + 1] * 

exp((r - 0.5 * sigma''2) * 

+ dt + sigma * sqrt(dt) * rnorm(l)) 

+ } 

+ } 

+ S 

+ } 

The next functions calculate the upper and lower estimators of the American 
call option: 


R> upperBG <- function(S, b, d, f) { 

+ tot <- sum(b*(l:(d - 1))) 

+ start <- tot - b'' (d - 1) + 1 

+ end <- tot + 1 

+ P <- S 

+ P[start:end] <- f(S[start:end]) 

+ totl <- sum(b*(l:(d -2))) 

+ for (i in totl:0) { 

+ m <- mean(P[i * b + l:b + 1]) 

+ v <- f(S[i + 1]) 

+ P[i + 1] <- max(v, m) 

+ } 

+ P 

+ } 

R> lowerBG <- functionfS, b, d, f) { 

+ tot <- sum(b*(l:(d - 1))) 
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+ start <- tot - b''(d - 1) + 1 

+ end <- tot + 1 

+ p <- S 

+ p[start rend] <- f(S[start:end]) 

+ totl <- sum(b*(l:(d -2))) 

+ m <- numeric(b) 

+ for (i in totl:0) { 

+ v <- f(S[i + 1]) 

+ for (j in l:b) { 

+ m[j] <- mean(p[i * b + (l:b)[-j] + 1]) 

+ m[j] <- ifelsefv > m[j], v, p[i * b + (l:b)[j] + 

+ 1 ]) 

+ } 

+ p[i + 1] <- mean(m) 

+ } 

+ P 

+ } 

Now we can test these functions on the example of Figure 7.6, so we prepare 
the vector S according to the data in the picture. 

R> b <- 3 
R> d <- 3 
R> SO <- 101 

R> S <- c (101, 114, 50, 115, 74, 88, 102, 38, 47, 65, 88, 149, 

116) 

R> K <- 100 

R> f <- function(x) sapplyfx, function(x) max(x - K, 0)) 

where f is the payoff of the call option. We now call the two functions to obtain 
the lower and upper estimates: 

R> lowerBG (S, b, d, f) 


1 ] 

8.111111 

2.000000 

14.000000 

0.000000 

10.333333 

0.000000 

0 . 000000 

8 ] 

0.000000 

0 . 000000 

0.000000 

0.000000 

49.000000 

16.000000 


R> upperBG(S, b, d, f) 

[ 1 ] 11.88889 14.00000 0.00000 21.66667 0.00000 0.00000 

2.00000 0.00000 

[ 9 ] 0.00000 0.00000 0.00000 49.00000 16.00000 

Notice that the estimates of the values of the option correspond with the first 
element of the returned vector but we plot the entire sequence just to show the 
correspondence between the values returned by these routines and the values 
in Figures 7.6 and 7.7 taken from Broadie and Glasserman (1997). In general, 
one has to simulate a trajectory and extract the values of the lower and upper 
estimates as follows: 
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74(0) 


88 ( 0 ) 


102 ( 2 ) 

38(0) 


47(0) 


65(0) 

88 ( 0 ) 


149(49) 


116 ( 16 ) 


t = 0 


t = t- 


t = T 


Figure 7.7 Example of simulated path from Broadie and Glasserman (1997). In 
parentheses the payoff at the node and in squared brackets the value of the lower 
estimator 0 of the the American call option. 

R> set.seed(123) 

R> b <- 3 
R> d <- 3 
R> K <- 100 

R> f <- function(x) sapplyfx, function(x) max(x - K, 0)) 

R> T <- 1 
R> r <- 0.05 
R> sigma <- 0.4 
R> SO <- 101 

R> S <- simTreefb, d, SO, sigma, T, r) 

R> lowerBG(S, b, d, f)[l] 

[ 1 ] 17.68876 

R> upperBG(S, b, d, f) [1] 

[ 1 ] 22.681 

and embed the last three lines of code in a Monte Carlo script. 

To understand the asymptotic results, we give a precise definition of the 
estimator. The random tree with b branches per node is represented by the array 


{s; i,2 "“ :f = 0, 


A simulated path is then the sequence 
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as in Figure 7.6. The high estimator 0 is defined recursively as follows: 



and 



®‘ t l '' — max ■ h, 


where h,{x) — max{ /) (x), g, (x )} is the value of the option at time t when S, = x, 
gt(x) = E{e- R <^h t+l (S t+l )\S t = x] is the continuation value at time t, f, (x ) is 
the payoff at time t, h T (x) — fr(x) and e~ Rl is the discount factor from t — 1 to 
t, with R, > 0. In our R code this factor is not included. The final high estimator 
is 0o and ©o „ is the average value of 0o in a Monte Carlo experiment with n 
replications. Under regularity conditions the following result holds true. 

Theorem 7.6.1 (Broadie and Glasserman (1997)) For a given number of 
Monte Carlo replications n, not necessarily diverging to oo, then 



MSo) 


and for finite b the bias is always positive: 


E(© 0 m > MSo). 


The low estimator 9 is defined recursively as follows. Let 



and define 


M-HJ 


ft u f, (sM 1 ) > 



e R ‘+ l 0!\ otherwise 


for j = 1 ,,b. Then let 



As before 6 q is the low estimate in a single simulation and is the Monte 
Carlo estimate after n replications. Then, the following result holds true. 
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Theorem 7.6.2 (Broadie and Glasserman 1997) For a given number of Monte 
Carlo replications n, unnecessarily diverging to oo, then 

E (6o,„(b)) b ^° MSo) 
and for finite b the bias is always negative 

E(0 O m < MSo). 


The approximate (1 — a)% Monte Carlo confidence interval for the true price of 
the American option is defined as 


max t max (5b 


K , 0), 0q, n — 


sWo.n) 


> ©0 ,n + Zi 


s (®0 ,n) 


where s(0o, n ) is the standard deviation of 6q„, s(©o,n) the standard deviation @o, n 
and the 1 — | quantile of the Gaussian distribution. Notice that the interval 
is cut from below by the payoff of immediate exercise. The point estimate in a 
given replication is given by the following formula: 

C — — max {max(5o - K , 0), 0q} + -@o- 


7.6.2 Longstaff and Schwartz Least Squares Method 

After the paper of Broadie and Glasserman (1997) many other variations of 
the algorithm and new solutions have been proposed in the literature. The most 
notable one is the Least Squares Method (LSM) developed by Longstaff and 
Schwartz (2001). This method is extremely easy to understand and implement. 
The key argument is the estimation of the continuation value by a simple regres¬ 
sion via Least Squares. This method requires a simulation of a single path, rather 
than the construction of a tree, on a grid of times f,, i = 0, 1,..., d. Let Vfix) 
and fi(x) denote respectively the value of the option and the payoff function at 
time ti given S tj — x. The continuation value at time r, given S ti = x is 

M 

C t (x) = E {V, +1 (5 f;+1 )| St { = *} = XXlM*) 

r= 1 

for some basis functions \j/ r and coefficients fi lr , r — \M. This represen¬ 
tation comes from the fact that the conditional expectation lives in a L 2 space 
where some basis exists, so it can be represented in term of the element of the 
basis. This coefficient f can be estimated via simple regression using the values 
(5 f; , V,+i (5, i+l ). Thus, the continuation value is estimated as 

C,(x) = $'fi(x) 


( 7 . 15 ) 







308 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


with 


$1 = (At, • • •, Am) , f(x) = f M (x)Y ■ 


Before going into the details of the algorithm, we present the working example 
found in the original paper of Broadie and Glasserman (1997) because it is very 
instructive. In this example it is assumed that the strike price is A' = 1.10, there 
are only times 0, 1, 2 and 3 — T, the interest rate is r = 0.06 and the initial 
value of the asset is 5b = 1. Only 8 paths have been generated and Table 7.1 
reports each path by row. At time t — 2 only options which are in the money 
are considered. These are indicated with an asterisk in Table 7.1. The holder 
of the option decides to exercise immediately or wait till expiration comparing 
the present payoff and the discounted expected payoff. In our case, the discount 
factor is exp(—r) = 0.94176. The expected payoff is estimated via regression 
taking into account all the trajectories using this matrix: 


Path 

Y 

X 

1 

.00 ■ 0.94176 

1.08 

Z 

3 

.07 ■ 0.94176 

1.07 

4 

C 

.18-0.94176 

0.97 

D 

6 

.20 ■ 0.94176 

0.77 

7 

.09 ■ 0.94176 

0.84 

8 

— 

— 


From these data the following simple regression model can be estimated 
E(T|Z) = —1.070 + 2.983X — 1.812X 2 . Putting S 2 in place of X, we obtain 
the continuation values C which are given in Table 7.1. For example, for 
the hrst trajectory we have —1.070 + 2.983 ■ 1.08 — 1.813 ■ (1.08) 2 = 0.0369. 


Table 7.1 Numerical example for the LSM algorithm from Longstaff and 
Schwartz (2001). In parentheses the payoff at expiry date. The asterisk indica¬ 
tes paths such that the option is in-the-money at time t — 2. 


Path 

f = 0 

t = 1 

t = 2 

t = 3 

Continuation 

C 

payoff 
at t — 2 

1 * 

1.00 

1.09 

1.08 

1.34 (.00) 

0.0369 

0.02 

2 

1.00 

1.16 

1.26 

1.54 (.00) 

- 

- 

3* 

1.00 

1.22 

1.07 

1.03 (.07) 

0.0461 

0.03 

4 * 

1.00 

0.93 

0.97 

0.92 (.18) 

0.1176 

0.13 

5 

1.00 

1.11 

1.56 

1.52 (.00) 

- 

- 

6 * 

1.00 

0.76 

0.77 

0.90 (.20) 

0.1520 

0.33 

7 * 

1.00 

0.92 

0.84 

1.01 (.09) 

0.1565 

0.26 

8 

1.00 

0.88 

1.22 

1.34 (.00) 

- 

- 
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This value has to be compared with the value of immediate exercise which, 
for the first path, is K — 53 = 1.10 — 1.08 = 0.02. For paths 1 and 3 it is 
worth waiting, while for paths 4, 6 and 7 it is more convenient to exercise 
immediately. The algorithm proceeds backward for t = 1, using the same rule 
and only considering trajectories which are in the money at time t = 1. Formally 
the algorithm proceeds as follows: 

(i) simulate n independent paths on the grid of times 

(ii) at t = T set V tj = f d (S tj ), j = 1,..., n, 

(iii) for i — d — 

- specify the I of paths in the money 

- discount the value Vi+i.j, j e / to input in the regression 

- given the estimates Vi+ij, run regression to get fij 

- estimate the continuation value as in (7.15) 


- if fiiSj) > set Vij(Sj) = MS {) else V, 7 = V, +1J 


(iv) calculate (Vn + ■ ■ ■ + Vj „) / n and discount it to get Vo 

As anticipated, the discounted value can be written as an expansion in a proper 
L 2 space. Longstaff and Schwartz (2001) propose the following approximation 
formula: 


M 


F(t k -i) = ^^ a j L j(X) 


where a,- are constants coefficients and the basis is formed by the Laguerre 
polynomials 


L 0 (x) — e 2 
L i(x) = e~ 2 (l — x ) 




The next code is taken almost identical from the corresponding R code by Coskan 
(2008) and implements the algorithm using the first three Laguerre polynomials. 

R> LSM <- function (n, d, SO , K, sigma, r, T) { 


+ 


+ 


sO <- S0/K 
dt <- T/d 


310 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


+ z <- rnorm(n) 

+ s.t <- sO * exp((r - 1/2 * sigma / '2) * T + sigma * z * (T*0.5)) 

+ s. t [ (n + 1) : (2 * n) ] <- sO * exp((r -1/2 * sigma^2) * T - 

+ sigma * z * (T^0.5)) 

+ CC <- pmax(1 - s.t, 0) 

+ payoffeu <- exp(-r * T) * (CC[l:n] + CC[(n + 1): (2 * n)])/2 * 

+ K 

+ euprice <- mean (payoffeu) 

+ for (k in (d - 1) :1) { 

+ z <- rnorm(n) 

+ mean <- (log(sO) + k * log (s. t [1 :n] ) ) / (k + 1) 

+ vol <- (k * dt/(k + 1)) A, 0.5 * z 

+ s.t.l <- exp(mean + sigma * vol) 

+ mean <- (log(sO) + k * log(s.t[(n + 1) : (2 * n) ]) ) / (k + 

+ 1) 

+ s.t.1[ (n + 1): (2 * n)] <- exp(mean - sigma * vol) 

+ CE <- pmax (1 -s.t.l, 0) 

+ idx <- (1:(2 * n))[CE > 0] 

+ discountedCC <- CC[idx] * exp(-r * dt) 

+ basisl <- exp(-s.t.1[idx]/2) 

+ basis2 <- basisl * (1 - s.t.l[idx]) 

+ basis3 <- basisl * (1 - 2 * s. t.1[idx] + (s. t.1[idx]*2)/2) 

+ p <- lm (discountedCC ~ basisl + basis2 + basis3)$coefficients 

+ estimatedCC <- p[l] + p[2] * basisl + p[3] * basis2 + 

+ p[4] * basis3 

+ EF <- rep(0, 2 * n) 

+ EF[idx] <- (CE[idx] > estimatedCC) 

+ CC <- (EF == 0) * CC * exp (-r * dt) + (EF == 1) * CE 

+ s. t <- s.t. 1 

+ } 

+ payoff <- exp(-r * dt) * (CC[l:n] + CC[(n + 1): (2 * n)])/2 

+ usprice <- mean (payoff * K) 

+ error <- 1.96 * sd (payoff * K) /sqrt (n) 

+ early ex <- usprice - euprice 

+ data.frame(usprice, error, euprice) 

+ } 

The function lsm returns the estimated value of the American option, the radius 
of the Monte Carlo confidence interval at 99% level, and the corresponding value 
of the European option. We compare this value with the approximation formulas 
of Sections 7.5 and 7.4 as implemented in the Rmetrics package. 

R> SO <- 36 
R> K <- 30 
R> T <- 1 
R> r <- 0.05 
R> sigma <- 0.4 

R> LSM<10000, 3, SO, K, sigma, r, T) 

usprice error euprice 

1 2.230353 0.04077382 2.201232 

R> require(fOptions) 

R> BSAmericanApproxOption("p", SO, K, T, r, r, sigma)@price 
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[ 1 ] 2.250694 

R> BAWAmericanApproxOption("p", SO, K, T, r, r, sigma)@price 
[ 1 ] 2.293530 

7.7 Bibliographical notes 

The present review of methods is not exhaustive, but it is a summary of what can 
be reasonably understood without introducing more advanced topics. An intro¬ 
duction to Unite difference methods as explained here can be found in Hull (2000). 
Monte Carlo methods and their extensions can be found in Glasserman (2004). 
In Coskan (2008) the author also introduces the RSA approach, an efficient 
modification of the LSM algorithm for the one-dimensional case. He also provides 
all relevant R code for special two-dimensional American basket options. 
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8 


Pricing outside the standard 
Black and Scholes model 


The standard Black and Scholes model relies on several assumptions which allow 
for explicit formulas and easy calculations in most cases. Unfortunately some of 
these hypotheses like the constant volatility, Gaussianity of the returns and the 
continuity of the paths of the geometric Brownian motion process are unlikely 
to hold for many observed prices. Indeed, prices often show jumps, changes in 
volatility (see e.g. Section 6.6.1) and the distribution of the returns is usually 
skewed and with high tails (see e.g. Section 5.4). Most of these stylized facts 
are indeed captured by Levy processes as we have seen. In this chapter we 
mainly consider the problem of pricing under the assumption that the dynamic 
of the financial prices includes some kind of jump process and/or non-Gaussian 
behaviour. We start with the simpler and most studied Levy process. 

8.1 The Levy market model 

Consider again the simple exponential Levy model analyzed in Sections 4.5.5 
and 5.4 


S, = S Q e Zt 


( 8 . 1 ) 


where Z t is a Levy process with triplet (b, c, v) and canonical decomposition 



such that E|Li| < oo. We have seen that it is possible to estimate the infinitely 
distribution of the Levy processes quite easily from the log-returns of real finan¬ 
cial data due to the fact that the increments of the Levy process are independent. 
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The property of the Levy process Z, naturally propagates to the asset price 
process S t thus, if Z, is a pure jump process, then also S, is a pure jump process. 
We have seen in Example 3.18.10 that applying Ito formula (3.56) to (8.1) we 
can see that S t solves the stochastic differential equation 

AS, = S t - | dZ r + -dt + f (e x — 1 — x) /x z (df, Ax) . 

I 2 J R 

As usual, to develop a theory of option pricing, one way is to identify a change 
of measure which makes the discounted process Sf = e~ r,s ' a martingale under 
this new measure. In this section we can consider the general case of dividends 
and hence we will focus on the discounted process Sf = e~ (r ~ s ^ S, where <5 is 
the continuous dividend yield. 

The first task is then to construct a proper change of measure such that the 
discounted process is a martingale. We have seen already under which conditions 
a Levy process can be transformed into a martingale in Section 3.18.6 and we 
now exploit this approach. 

8.1.1 Why the Levy market is incomplete? 

The problem with Levy markets is that a whole set of equivalent martingale 
measures exist and this, in turn, implies that the market is incomplete as seen. To 
understand why multiple martingale measures exist in the Levy market we follow 
Papapantoleon (2008). Let us denote by P and Q the true physical measure 
of Z, and the equivalent martingale measure which makes Sf = e~^'~ s>t S, a 
martingale. We denote the corresponding characteristic triplets of P and Q as 
( b , c, v) and (/;, c, v) respectively. The two measures are related, via the Girsanov 
theorem, in this way 

c = c, v = Yv, b = b + cP + x(Y - 1) * v. (8.2) 

with (i3 , Y) the two processes from Theorem 3.18.13. Under Q. Z, has the 
canonical decomposition 

Z, — bt + VdW, + J J ULr (/r z — v z ) (ds, d.r) 

where W r is a Brownian motion under Q and v z is a compensator of the jump 
measure /i 2 under Q (see again Theorem 3.18.13.) But, in order to have a 
martingale, the following condition must hold 

b = r — 8 -/ (e x — 1 — x) v(dx). (8.3) 

2 Jr 

Therefore, equating (8.2) and (8.3) with c = c and v — Yv, we obtain 

0 = b + c/3 + x(Y — 1) * v — r — — + (e x — 1 — x) * v 
= b- r + c^fi + -^J + (( e x - l) Y - x) * v. 


(8-4) 
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This means that, in order to obtain a martingale, we need to solve an equation with 
two unknown f and Y and each solution of (8.4) in terms of the couple ( fj. Y ) 
provides an equivalent martingale measure. As noticed in Papapantoleon (2008), 
this result is completely in line with what we know about market completeness 
in the Black and Scholes model. Indeed, for the geometric Brownian motion the 
Levy process takes the form Z, = bt + *fcW t . Then Equation (8.4) has a unique 
solution, which is: 


So the martingale measure is unique and the market complete. 

In their seminal paper Eberlein and Jacod (1997) have characterized the range 
of possible fair price of options under the exponential Levy model. Let us denote 
by g a generic equivalent martingale measure and let X be the payoff of a 
contingent claim. Then, at time zero 

Po = P q Q = e~ rT E Q (X). 


Then P® e [m , M] where 

m — inf {e _rr Eg(Z)| Q an equivalent martingale measure} 

M = inf {e _, ' r Eg(Z)| Q an equivalent martingale measure} 

Without any additional information, this interval is usually very large, but for 
options with payoff X — /(St), then m — e~' T f (e' r So) and M — So. In addi¬ 
tion, it is possible to prove that if the price falls outside this interval, there are 
arbitrage opportunities, so arbitrage free option prices should lie in that interval 
and the result is independent of the particular Levy model chosen to construct 
the process S t . So now the problem is how to choose one among all equiva¬ 
lent martingale measures. We will consider a few examples from the different 
alternative proposals available in the literature. 

8.1.2 The Esscher transform 

Gerber and Shiu (1994) introduced for the first time the so-called Esscher trans¬ 
form to find one possible equivalent martingale measure. Let f(x) be the density 
of Z\ under the real measure P. Let 6 be a real number such that 6 e [9 e M : 
/ R cxp(9x)f(x)dx < oo}. Then, we can construct a new density 

_ exp (Ox)f(x) 

9 X / R exp(0x)/(x)dx 

and choose a value of 6 such that the discounted price process, Sf — e~^ r ~ s ^S t , 
is a martingale, i.e. 

So = e-( r - s »E e (S t ) 
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where Eg is the expectation under fgf). Now let (p(u) — Ecxp(/«Z|), the 
characteristic function of Z\, then in order to have a martingale 0 must solve 
the following equation: 


expfr — S) — 


<p(-i(0 + D) 

(p(-iO) 


(8-5) 


We denote this solution as 9*, thus under fg*{-) the discounted process is a 
martingale and the corresponding equivalent martingale measure is the one with 
density fg*. Gerber and Shiu (1996) justify the choice of this particular equivalent 
martingale measure in terms of utility-maximization theory. The nice properties 
of this approach is that the new density is still infinitely divisible. Moreover, 
if Z\ has characteristic triplet (b, c, v) then, the characteristic function cpg(u) — 
Eg expO'wZi) is given by 

(p(u - iO) 

<PeW = 




and has triplet ( bg,c,vg ), where 


= b + c 2 9 + J — l) v(dx), vg( dx) = e 0x v((bc). 


Last but not least, this transform is very easy to obtain for many models. 

Example 8.1.1 Let us consider the Black and Scholes model. In this 
case Zi ~ N (/x — |cr 2 , cr 2 ), hence the characteristic function is cp(u)— 

exp ^'m(/x — ^cr 2 ) — The martingality condition (8.5) becomes 

r - 8 = ti- ~cr 2 + ^cr 2 (26» + 1) 


thus 


e* = 


r — S — fi 


The density of the equivalent martingale measure is 


(n* (*-U+?a 2 y 
exp 9*x - 2a2 


f 9 *(x) = 


/r exp ( 9*x — 


(-T — /i-t-T 0 " 2 ) 


2a ' 2 


dx 


exp I 


2a 2 


/ R exp d. 
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exp ^ 2^2 

i (H r - S -^ 2 )) 2 

— p 2a 1 

s/lncr 2 

which is the density of the Gaussian law N (r — 8 — \cr 2 , cr 2 ) as expected for this 
model from Section 6.4. 


For the Meixner process with Z\ ~ Meixner(a, f, y), the Esscher transform 
gives a new martingale measure with parameters Meixner(c>', a6* + ft, y) with 


6* = — 1/1 + 2 arctan 

a l 


-cos(f) + exp(^j 

Ml) 


while for the NIG(o!, ff y) the resulting equivalent martingale measure is 
NIG(a + 6* + f, y) with 6* the solution of 

r — 8 — y (7a 2 - (P + 6) 2 - a 1 - (f + 6 + l) 2 ) . 

In many other cases, the solution is not explicit but can be obtained numerically. 
For more details see Schoutens (2003). 


8.1.3 The mean-correcting martingale measure 

We have seen that adjusting for the mean is a way to transform the discounted 
process into a martingale. Assume that (p{u) is the characteristic function of 
a Levy process Z t . It is possible to construct a new process Z, by adding a 
constant drift mt into a new process such that the distribution of Z, is translated 
by the same amount. This is done directly setting Z, = Z, + mt and, in terms of 
characteristic functions, 


(p(u) — <p{u) exp(iwm). 

In this way the characteristic triplet changes simply from ( b , c, v) to (b — b + 
m, c — c, v — v) and the two densities are related by the formula f(x) — f(x — 
m ). In this setup, all the processes of Section 5.4.1 are transformed as follows: 
Meixner(o:, ft, 8) into Meixner(a, ft, 8, m); NIG(o', fi, 8) into NIGicr, fi, 8. m); 
VG(C, G, M) into VG(C; G, M. m)\ etc. Each model requires a different value 
of m. According to Schoutens (2003), the solution requires the following identity: 

m' = m + r — 8 — log cp(—i). 
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The initial m can be zero. For example, the Black and Scholes model is such 
that <p(—i) = ii. The physical measure has m = /i — \cr 2 and the new measure 
has m' = r — S — ^cr 2 , and we obtain again the standard result. Table 6.2 in 
Schoutens (2003) provides a summary of the mean transformation for most of 
the processes of Section 5.4.1. 

8.1.4 Pricing of European options 

Once a martingale measure Q is available, it is possible to price options under 
the Levy market. If we have a contingent T -claim such that the payoff function 
is /(■) depends only on St, the usual formula applies 

E Q {e~ rT f(S T )). 

When Q has been obtained with the Esscher transform, then the above expected 
value is calculated as 

/ OO 

ge*(x)dx 

/ CO 

ge*(x) dx 

where gg*(-) is the density of Q and c = log(^f/S0). 


8.1.5 Option pricing using Fast Fourier Transform method 

Another way is to use the FFT algorithm to invert the density of the trans¬ 
formed process. The idea dates back to Scott (1997), Carr and Madan (1998) 
and Bakshi and Madan (2000). For simplicity we assume no dividends, i.e. 
5 = 0 and we describe the approach under the physical measure. Let cp(u) = 
E(exp(/w log(5 , 7’))), the characteristic function of the random variable log (.S'-/-). 
For a European call option the general formula for the price is 

C(K, T) = 5 0 n! - Ke~ rT n 2 


where TL = P(S T > K ) is the probability of finishing in the money and TT i is the 
delta. These two values can be obtained from the inversion of the characteristic 
function <p(u) as follows: 

n _ 1 1 [°° Re ( ex P(-*’“ lQ g fQE(exp(f (u - i ) log S T ) \ ^ 

1 2 n Jo V iuE(S T ) ) 

= 1+1 r R e( eXpM " 1 ° SgW "~ i> )d„. 

2 TT Jo V iu<p{—l ) ) 

1 + 2_ [°°R e ( exp(-;» log jQE(exp(iM log S T ) \ ^ 

2 n Jo \ iu ) 

- 1 + 1 R e ( ex P(~ i '“ 1 °g^')y ) ( M ) \ du 
2 n Jo V iu ) 
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The above two integrals can be evaluated by numerical integration when the 
characteristic function is known as in the exponential Levy models. Another way 
to see this problem is to use the FFT algorithm to obtain the price of the option. 
Let k = log K, the logarithm of the strike price and let Cy (k) be the price at 
maturity T of the call option with strike exp(k). Let st = log (.S’/) and qr(-) the 
density of the equivalent martingale measure. The characteristic function of the 
density qr(-) is defined as 

<Pt(u) = [ e ,us q T (s)ds. 

Jr 


Then 


Cr(k) — J e ' T ^ e e ^qr(s)ds. 


This function of k is not square integrable because it does not converge to zero 
but to So as k goes to — oo, therefore Carr and Madan (1998) propose to introduce 
an exponential dampening factor a into the function Cy(k). The new function is 
defined as 

c T (k) = e ak C T (k) 


which is possibly square integrable for all k on same range of values of a. The 
characteristic function of cy(k), denoted by i//y ("■), is given by 

i J/t(u) — f e luk cr(k)dk. 

Jr 


The idea is to express Cy(k) as a function of i/'YO) 

g—ak p g—ak poo 

Cr(k) — —— / e~ mk \jrj{u)du — - / e~ luk \l/r(u)du ( 8 . 6 ) 

71 Jo 

where the last equality follows from the fact that Cj(k) is a real number. The 
last step is to establish a relationship between J/y (■) and (pr {■) and then apply 
the FFT algorithm. Luckily, this is quite straightforward. In fact. 


iMk) = [ e iuk e ak e~ rT (e s - e k ) dsdk 

Jr Jk 

- f e~ rT q T (s) f (e s+ak -e (l+a)k )e iuk dkds 

Jm. J—oo 

-~ rT q T {s) 


[ e~ rT q T (s) \ 

Jr [ 


g(a+l +iu)s e (a+l+iu)s 


a + iu 


a + 1 + i u 


— e 


—rT 


1 


a + iu a + 


i - ( e (a+1+iu)s q T (s)ds 

1 + iu/ J® 


„-rT 


<Pt(u - (a + 1)0 
a 2 + a — u 2 + i(2a + l)n 


Now this version of V f r(J can be directly plugged into (8.6). 
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The unsolved issue is how to choose a. The first thing to notice is that when 
a — 0 also V-O'Cm) = 0, thus the factor exp {ak) is required in a sense. In order 
to have integrability, we need to check whether r/r(0) < oo and 0(0) is finite 
only if cp T {—{a + 1)0 if finite. This requirement is a condition on the following 
expected value 

E(S“ +1 ) < oo. 

In practice, one has to find the maximal value of a such that the above moment 
condition is satisfied. Carr and Madan (1998) propose to choose a to be one 
fourth of the maximal a. 


8.1.6 The numerical implementation of the FFT pricing 

Using the trapezoidal rule to approximate the integral. Equation (8.6) can be 
rewritten in approximate form as follows: 

e -ak 

C T (k ) ps -Ve-'UVrObO'L (8-7) 

* U 

where Vj — ri(j — 1). According to Carr and Madan (1998), one should keep 
in mind that the effective upper bound of integration is a — Nrj. We set up a 
regular grid for the argument k (the log strike level) in the following form: 

k u = —b + X(u — 1), u = 1,..., N, 


where X is the step size of the regular grid. Thus, this grid spans the interval 
from — b to b. where b — \ NX. Now replacing the quantities into (8.7) we have: 


—nk N 

o aK u ___ 


C T {k u ) ps -) f| 

n t—* 


7=1 


To apply the FFT transform we need to ask for this condition: 

2tt 

Xr] = If’ 

Finally, replacing this last argument and applying Simpson’s rule to increase the 
approximation of the integral (8.6) we obtain the following formula: 


—ak N 

C t (K) ps (3 + (-1)' - . (8.8) 


7Z 


7=1 


where S„ is 1 for n — 0 and zero otherwise. The above calculations assume that 
the initial value Sq is one. In practice we should always set k = log(K/So) rather 
than k = log K and thus, the option price for the general case So ^ 1 is given by 
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the formula C^{k u ) — S^C, (k u ). This is how the script is implemented. A further 
note is that all calculations should be made under the characteristic function of 
an equivalent martingale measure. As seen, one of the most effective ways is 
to use the mean-correcting martingale measure approach in Section 8.1.3. This 
means that, given the characteristic function <p of Z\, one transforms it into 


ip{u) — <p(u) ■ exp(inm) 


and the new location parameter is m' — r — log cp(—i). The next code takes as 
argument the strike price K, the initial value of the underline asset So, the interest 
rate r, the expiry date T and the characteristic function of Z\. Then it applies the 
mean-correcting transform, calculates i// 7 - from cp T , inverts the call price via the 
FFT algorithm and finally rescales the price in Equation (8.8) by So to obtain C*\. 
Internally the algorithm applies the spline smoothing before returning the final 
value. This algorithm is a version of what is available on http://quantcode.com/ 
and in Sengul (2008). 

R> FFTcall.price <- function(phi, SO, K, r, T, alpha = 1, 

N = 2*12, 

+ eta = 0.25) { 

+ m <- r - log(phi(-(0+li))) 

+ phi.tilde <- function(u) (phi(u) * exp((0+li) * u * m))*T 

+ psi <- function(v) exp(-r * T) * phi.tilde((v - (alpha + 

+ 1) * (0+li)))/(alpha*2 + alpha - v*2 + (0+li) * (2 * 

+ alpha + 1) * v) 

+ lambda <- (2 * pi)/ (N * eta) 

+ b <- 1/2 * N * lambda 
+ ku <- -b + lambda * (0: (N - 1) ) 

+ v <- eta * (0: (N - 1)) 

+ tmp <- exp ((0+li) * b * v) * psi(v) * eta * 

(3 + (~1)*(1:N) - 
+ ((1:N) - 1 == 0))/3 

+ ft <- fft (tmp) 

+ res <- exp(-alpha * ku) * ft /pi 
+ inter <- spline(ku. Re(res), xout = log(K/SO)) 

+ return(inter$y * SO) 


+ } 


Now, recall that we know the exact formula of the Black and Scholes price of 
the European call option and it is implemented in the GBSOption function in 
package fOptions. We test the FFT method against the exact formula for the 
geometric Brownian motion model for which the characteristic function of Zj is 



which we code as the phiBS function below: 


R> phiBS <- function(u) exp((0+li) 
+ 0.5 * sigma^2 * u^2) 


u 


(mu -0.5 * sigma^2) 
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We now price the European call option with 


R> SO <- 100 

R> K <- 110 

R> r <- 0.05 

R> T <- 1/4 

R> sigma <- 0.25 

R> mu <- 1 

R> require(fOptions) 

R> GBSOption(TypeFlag = "c", S = SO, X = K, Time = T, r = r, 
b = r, 

+ sigma = sigma)@price 
[ 1 ] 1.980509 

R> FFTcall.price(phiBS, SO = SO, K = K, r = r, T = T) 

[ 1 ] 1.984243 

the two prices look quite close. Figure 8.1 show a plot of the difference between 
the price given by the exact formula and with the FFT approximation. The 
variation is very small and mostly due to numerical factors. The plot has been 
generated using the following code: 

R> K.seq <- seq(100, 120, length = 100) 

R> exactP <- NULL 
R> fftP <- NULL 
R> for (K in K.seq) { 

+ exactP <- c(exactP, GBSOption(TypeFlag = "c", S = SO, X = K, 

+ Time = T, r - r, b = r, sigma = sigma)Sprice) 

+ fftP <- c(fftP, FFTcall.price(phiBS, SO - SO, K = K, r = r, 

+ T = T)) 

+ } 

R> plot(K.seq, exactP - fftP, type = "1", xlab = "strike price K") 



Figure 8.1 Difference in the price of the European call option between the exact 
Black and Scholes formula and the FFT approximation. 
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As a different example, we consider now a Variance Gamma process. This process 
is obtained by evaluating the Brownian motion with drift 6 and volatility a at a 
random time given by the gamma process with mean rate 1 and variance v, i.e. 


B t (8, a) — Ot + aW t 


where W t is the standard Brownian motion and t is a random variable distributed 
as T(t/v, 1 /v). The exponential Levy process S, is then written in this form under 
the risk neutral measure: 


S, = S 0 e rt+z,+cot 


where 



where co is the compensator which ensures a martingale property. The charac¬ 
teristic function of the logarithm of the process S, is the following: 



which we implement in the code phivG below. 

R> theta <- -0.1436 

R> nu <- 0.3 

R> r <- 0.1 

R> sigma <- 0.12136 

R> T <- 1 

R> K <- 101 

R> S <- 100 

R> alpha <- 1.65 

R> phiVG <- function(u) { 

+ omega <- (1/nu) * (log(l - theta * nu - sigma^2 * nu/2)) 

+ tmp <- 1 - (0+li) * theta * nu * u + 0.5 * sigma''2 * u*2 * 

+ nu 

+ tmp <- tmp'' (-1/nu) 

+ exp((0+li) * u * log(SO) + u * (r + omega) * (0+li)) * tmp 


+ } 


We now need to pass this characteristic function to our FFTcaii .price function 
R> FFTcall.price(phiVG, SO = SO, K = K, r = r, T = T) 

[ 1 ] 10.98145 

and obtain the result. We now want to compare the price given by the FFT 
approach with a Monte Carlo price, so we need to simulate the terminal value 
of St- For simulation purposes, one can see that conditional on the gamma time 
change t, the Variance Gamma process Z, over an interval of length t is normally 
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strike price K 

Figure 8.2 Difference in the price of the European call option between the Monte 
Carlo price and the FFT approximation for Variance Gamma process. 

distributed with mean 9t and variance afft and thus one can simulate the process 
using this representation 

Z t — 9t + cr \ftz 

where t ~ N(0, 1) independent of t ~ Gamma(i/u, 1 / v). The next code simu¬ 
lates n terminal values of the VG process and calculates the Monte Carlo value 
of the option. 


R> n <- 50000 

R> t <- rgammafn, shape = T/nu, scale = nu) 

R> N <- rnorm(n, 0, 1) 

R> X <- theta * t + N * sigma * sqrt(t) 

R> omega <- (1/nu) * (logfl - theta * nu - sigma''2 * nu/2)) 

R> S <- SO * exp (r * T + omega * T + X) 

R> payoff <- sapply(S, function(x) max(x - K, 0)) 

R> mean(payoff) * exp(-r * T) 

[ 1 ] 11.00923 

As seen these prices are quite close; we can run an experiment similar to what 
we did for the Black and Scholes model and the results are given in Figure 8.2 
without any comment on the code. 

R> K.seg <- seq(100, 120, length = 100) 

R> mcP <- NULL 
R> fftP <- NULL 
R> for (K in K.seq) { 

+ t <- rgamma(n, shape = T/nu, scale = nu) 

+ N <- rnormfn, 0, 1) 

+ X <- theta * t + N * sigma * sqrt(t) 

+ S <- SO * exp(r * T + omega * T + X) 

+ payoffvec <- sapply(S, function(x) max(x - K, 0)) 

+ tmp <- mean(payoffvec) * exp(-r * T) 
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+ mcP <- c(mcP, tmp) 

+ fftP <- c(fftP, FFTcall.price(phiVG, SO = SO, K = K, r = r, 

+ T = T) ) 

+ } 

R> plot(K.seq, mcP - fftP, type = "1", xlab = "strike price K") 

These results show that, at least for this configuration of the parameters, the FFT 
approach provides a reasonable approximation of the price and FFT is almost 
instantaneous compared to the Monte Carlo approach. Unfortunately this is not 
always the case as noticed by several authors and several modifications to the 
FFT algorithm have been proposed. One notable variation is the so-called Lewis 
regularization method which we do not present here. For further reading see 
Lewis (2001). 


8.2 Pricing under the jump telegraph process 


Following Ratanov (2007a), let {cr t ,t > 0} be a Markov process with values ±1 
and transition probability intensities A.± such that, as At —> 0, 

P(o(t + At) = +l|cr(f) = -1) = X-A + o(At), 

P(cr(t + At) = —1| ct( 0 = +1) = k + A + o(At). 


From this process, we build the new processes c a , — c±, h„ t = h± > — 1 and 
r at — r±> 0. Further we denote by X" — [' c n , d.s the telegraph process and we 
introduce a new jump process / r CT , with alternating jumps of sizes h±. The risk¬ 
free asset { B t , t > 0), is given by the exponent of the process Y t a — r„ s d.v, i.e. 
B, = exp(T, CT ), which means that the current interest rate depends on the market 
state. Everywhere the exponent a indicates the starting value of the process cr r , 
i.e. a — (Tq. The process r at captures the movement of interest rates and the 
jump process J” the market jumps or crashes. The driving process a, is the only 
source of randomness in this model and represents contractions of expansions 
of the market. In this setup the risky asset follows this stochastic differential 
equation 

d S° = S?-d(X? + J”) 


According to Ratanov (2007a), the solution of this equation has the following 
exponential form: 

S” = S 0 £ t (X a + J a ) = S 0 e x ' k° , S 0 = S %, 


where 


K t 


[l(l + Ay T ff ) = 

r<r 



and tj, j > 1 are the jump time instants of the process and N" the Poisson 
process with alternating probabilities /, + and a _ . The symbol £ t (■) is called the 
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stochastic exponential and widely used in the analysis of jump processes (see 
Jacod and Shiryaev 2003). Ratanov (2007a) proved that X a + J n is a martingale 
if and only if the following relationships are realized: 


and since the model is driven by a single source of noise, the market can only 
have a unique martingale measure. 


Theorem 8.2.1 (Ratanov (2007a)) Let Z, — S,(X* + J*), t > 0, with 
h* = —c*/X a be the Radon-Nikodym density of the probability P* with respect 
of P, i.e. 

d p* 

Z, = -= £,(X* + J*) = e x > k* t > 0. 

dP 

Then, the process 


is a martingale with respect to the measure P* if and only if 

Crr - r a 


c ,I — + 


hr. 


a = ±1. 


Under P*, the Markov process has the new intensity 

1 * r o ~~ c o n . . 

XI — -> 0 , a — ± 1 

h a 

A pricing formula to complete the theory is available as well. Let /(■) denote 
the payoff function of T -claim and introduce the pricing function 

Fit, x, cr) = E* (r_? '/ [xe x {T — t)x{T — /))| cr 0 = cr} . 
a = ±1, 0 <t<T. 


Then Ff, ■, ■) solves the following difference-differential equation: 

( r a ~ c a \ 

Ft(t, x, cr) + c a xF x (t, x, a ) = I r a -|--- 1 F(t,x,o ) 

- 7 - ~F(t, x(l + h a ), —a), a — ± 1 

h„ 

with terminal condition F,^ T — f (x ). The above equation is the analogue of the 
fundamental equation (6.1) in the Black and Scholes model. This equation has 
the following explicit form 


oo . 

F{t, x, cr) = e briT ~ ,) Y'' / e~ ary f ( xe y K a ) p(y, T — t : n)dv, 
n= 0 J * 
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where /?(■, ■ : n) is the transition density of telegraph process after n jumps and 

r + — r_ c + r_ — c_r + 

cif =z , b r = . 

C + — C- C+ — C- 

Notice that the equation do not depend on A.± as the original Black and Scholes 
equation does not depend on the drift /x. When /,_ = = 7. the formula simpli¬ 

fies and the price of the call can be written in a way similar to the standard Black 
and Scholes formula. We do not investigate this model further, but more in depth 
analysis, as the convergence of the model to the Black and Scholes model, can 
be found in the following series of papers Ratanov (2005a,b, 2007a,b). Earlier 
works which use the telegraph process as the basis of financial models are Di 
Masi et al. (1994) and Di Crescenzo and Pellerey (2002). 

8.3 Markov switching diffusions 

In Section 3.20 we introduced stochastic differential equations with Markov 
switching regimes. We remind briefly the components of this model. A Markov 
switching diffusion is a process X t solution to 

dX, — f(X t , a,)dt + g(X t , a f )dB f , X Q — x 0 , a(0) = a, (8.9) 

where {Bf, t > 0} is an n -dimensional Brownian motion, a T is a finite-state 
Markov chain in continuous time with state space S and generator Q = [qij], 
/(■, ■) : K r x S -> R" and g(-, ■) : I" x 5 -> M' ,x ”. The initial value x, the 
Brownian motion B and the Markov chain a, are all mutually independent. An 
example of this model in R is the geometric Brownian motion with switching 

d S t — ix(a t )S t dt + cr(a t )S t dB t , So = so 

where i e S is the expected return and ad) > 0, i e S represents the stock 
volatility with f(x,y ) = /x(y)v and g(x, y) — a(y)x in the general stochastic 
differential equation (8.9). This model is intended to capture macro fluctuations 
of the market which change the sets of parameters fi and a temporarily during the 
period. So it is a slight generalization of the standard Black and Scholes model 
in which fi and a are constant. For example, in a Markov chain a, with only two 
states, i.e. S — {0, 1}, a, — I may indicate that the market is in expansion and 
/it I) = /i | and a{\) — o\ are the trend and volatility parameters of the standard 
Black and Scholes model during market expansion; a, — 0 implies /i(0) = /io 
and a (0) = op another set of parameters during market contraction. The Markov 
process a, can also be of the type a, = (a ?, a ") where a 1 ,' models the trend at 
time t and af the volatility at time t. For simplicity we now write /x„, = /i (a ,) 
and o at — a (a, ) so that we have the following system of differential equations: 


d,Sj — Sffi at dt -j- Sf(7 at dBf , 

dR, = r R,t\l 
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for t > 0, where R, is bond-type risk-free asset. We assume that 5 = {0. I} 
so there are only the four parameters /x,, a,, i =0,1. Clearly //1 — /xq and 
o\ — (To correspond to the standard Black and Scholes model. We assume that 
the continuous time Markov chain changes its state from i to j with rate /., and 
the waiting time is denoted by r, . Then 


P(ij > t) — e — Xit, i = 0, 1. 


To live in a market without arbitrage, we need to find an equivalent martingale 
measure such that the discounted process Sf — e~ rt S t is a martingale under the 
new measure. We now describe how to get the martingale measure which is 
similar to what we see in Section 6.4. Let f(t,x ) = e~ r, x and apply the Ito 
formula to fit, S ,). Clearly we have that 

ff(t,x) = -re~ rt x, f x (t,x) = e~ rt , f xx (t,x) = 0 


therefore, by Ito formula we have 


fit, S,)= /(0, So) + f {fiu, S u ) +f x iu, S u )ix au S u }&u + [ ffu, S u )cr au S u dB u 
Jo Jo 

= So + [ {—re~ ru S u + e~ ru fx a „ S„} dn + f e~ ru a au S u dB u 
Jo Jo 

[ (/A*„ -r)e~ ru S u Au + f e~ ru cr au S u AB u 
Jo Jo 


— So + 


thus 


and then 


Sf = Sf + f (ix au - r) Sfdu + f a au S d u dB u 
Jo Jo 


d Sf = ifi at - r)Sfdt + o a ,SfdB t . 


Now we can define a new Brownian motion under the new measure Q: 


Vt =r^ 

Jo Ga u 


W, = I — - -du + B t . 


Now replacing B, with IT, in the stochastic differential equation of Sf we get 
d Sf = Sf {{/x au - r) dt + o au dB ,} 

= Sf | {fi au - r) dt + o au ^dW f - — --d t 


J a t 


= cr ai SfdW t 
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which shows, by the properties of the Ito integral, that Sf is a martingale. In this 
setup 



is the Radon-Nikodym derivative of Q over P given by the Girsanov’s theorem 
3.16.1, so clearly P and Q are equivalent measures. Under the new measure we 
also have that 

d S t =rS t dt + cx at dW t , 

by replacing Sf by e~ rt S t . Now that we have a martingale measure we need 
to ensure completeness of the market. This market is not complete because we 
have an additional noise process a, which is not adapted to the filtration of 
the Brownian motion, say \T ,, f > 0), but only to the filtration generated by .S',, 
say {Q ,, t > 0}, thus according to Harrison and Pliska (1981) the market is not 
complete as it is. In order to have a complete market Guo (2001) suggests a 
new security be introduced into the market. That is, at each time t, there exists a 
security that pays one unit of account (say $) the next time r(t) = inf {u >t\a u ^ 
a,} that the continuous-time Markov chain switches its state. One can think of 
this as an insurance contract that compensates its holder for any losses that occur 
when the next state change occurs. This security is called COS ‘change-of-state’ 
contract. According to Guo (2001), the COS contract should be traded for a 
price of 

V, =E{{e“ (r+ * ( “ f))(TW -°|g I } 


where k : {0, 1} —»■ ffi is a given value that can be considered as a risk-premium 
coefficient. More precisely, the COS price is given by the formula: 


Vt = J(a t ), 


with Ji = 


MO 

r + MO + MO 


and X(a t ) is the intensity of the point process N that counts the changes of states 
in the Markov chain. Under the measure Q, the price V t takes this form: 


V, =E Q {e- r(rt - f) | Q<) 


and the price of the COS security is zero after the next change of state. Under 
the martingale measure Q, the counting process N has intensity A.g(a r ), thus 


V, = J Q (a t ), with J Q {i) = 


M0 


r + X Q (i) 

Now, due to the fact that J — Jq, we also have that 

r MO 


M 0 = 


r + MO 
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Similarly, the underlying risky-asset under the new measure Q solves the 
following stochastic differential equation 


d S, = (r - d a ,)S,dt + S t a at dW, 


( 8 . 10 ) 


where d at — r — pt at . The solution of the above stochastic differential equation is 



S, — So exp 


with initial condition Sq. As usual the proof can be done using Ito formula as in 
the standard case of the geometric Brownian motion process. We are now ready 
to express the pricing formula for the European call option. Remember that the 
payoff function for the call is f(x) — max(x — K, 0). 

Theorem 8.3.1 (Guo 2001) Let a? be a continuous time Markov process with 
state space tS — {0, 1} and Tj be the random variable which measures the total 
time between 0 and T during which a, — 0 starting from state i — 0, 1. Assume 
we have a COS security, a risk-free interest rate r a martingale measure Q such 
that the price S t satisfies (8.10). Then the arbitrage-free price of a European call 
option with maturity T and strike price K is 


CAT, K, r ) - Eg {e~ rT f(S T )\ct 0 = i} 



Jo Jo y + K 


where (p(x\ m, v ) is the density of the Gaussian random variable N (m, v), with 


m, = log(So) + \ di - d Q - °° ^ g ‘ 
v t = (<r 0 2 - of) t + of T 



and fii (■, T) is the probability distribution off, i — 0, 1 


foil, T) = e~ k ° T 8 0 (T - t) 


_|_ (T—t)—kot 


|a 0 / 0 — o) 



f 1 (t,T) = e- klT 8 0 (T -t) 
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with Xq, X\ are the total rate of leaving out of state 0 and 1 respectively and I v (x) 
is the modified Bessel function of order v (see (Abramowitz and Stegun 1964)) 
and 8 is the Dirac delta function. 

Notice that if /iq — and cr 0 = a\, and aq = X\ =0, then in, and v, do not 
depend on t and the pricing equation fails back to the standard Black and Scholes 
solution. Although Equation (8.11) is explicit, even numerically it is quite hard 
to compute. Fuh et al. (2002) proved an approximation formula based on the 
following asymptotic argument. The random variable f/T —>■ tt,- in probability 
as T —> oo where 7r, is the element of the stationary distribution of a, 
corresponding to state i e S. Then 7} can be replaced by the quantity 7r,T in 
Equation (8.11) and, as T —* oo, we have that 


f 0 (t,T) = xln(t,T) = 



when t — T, 
when t T. 


Then, Equation (8.11) becomes 

y 


Vi=e 


-rT 


f 


y + K 


<p(log(y + K ); m(ji 0 T), v(n 0 T )) 


x (1 - e~^ T 8 0 (i - 0) - e- x ' T S 0 ( 1 - 0) + ?>(log(y + K); m(T ), v(T)) 


x e~ XoT 8 0 (i - 0)^3(log(y + Kf m( 0), u(0))e- Air 5 0 (l - i) 


Ay 


which gives the two solutions Vq and V\ below: 


Vq — e 


-rT 


f 


y 


V\—e 


y + K 

+ 93 (log(y + K)\ m(T), v(T))e~ x ° T 

y 


«5(log(y + K); m(noT), v(7i 0 T)) (l 
dy 


-rT 


f 


y + K 

+ ?>(log(y + K); m(T), v(T))e~ klT 


«5(log(y + K); m(jioT), v(tzoT)) (l 

dy 


- e~ x ° T ) 

( 8 . 12 ) 

-e-'' T ) 


These two approximations require only one-dimensional integration which is 
more accurate and faster. We present the code for Vo and V] below. 

R> V0 <- function (SO, K, T, r, sO, si, LO, LI, pO) { 

+ m <- function(t) log(SO) + (dl - dO - 0.5 * 

(s0*2 - sl*2)) * 

+ t + (r - dl - 0.5 * sl / '2) * T 

+ v <- function(t) (s0''2 - sl'~2) * t+sl*2 * T 

+ f <- function(y) { 

+ y/(y + K) * (dnorm (log (y + K) , m (pO * T) , sqrt (v (pO * 

+ T) )) * (1 - exp(-L0 * T)) +dnorm (log (y + K) , m(T), 
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+ sqrt(v(T))) * exp(-L0 * T)) 

+ } 

+ integrate(f, 0, Inf, subdivisions = 1000)$value * exp(-r * 

+ T) 

+ } 

R> VI <- function (SO, K, T, r, sO, si, LO, LI, pO) { 

+ m <- function(t) log(SO) + (dl - dO - 0.5 * 

(s0*2 - sl*2)) * 

+ t + (r - dl - 0.5 * sl A 2) * T 

+ v <- function(t) (s0*2 - sl*2) * t + sl / '2 * T 

+ f <- function(y) { 

+ y/(y + K) * (dnormflog (y + K) , m(pO * T) , sqrt(v(pO * 

+ T))) * (1 - exp (-LI * T)) + dnormflog (y + K) , m(0), 

+ sqrt(v(0))) * exp (-LI * T)) 

+ } 

+ integrate(f, 0, Inf, subdivisions - 1000)$value * exp(-r * 

+ T) 

+ } 

Clearly, if A., = 0, the value V, converges to the corresponding Black and Scholes 
price under state i — 0. Figure 8.3 shows this behaviour for <t 0 = 0.2, a\ = 0.4, 
r = 0.1, do — di — 0, So — 100, K — 110 and tiq — .5 for different maturity 
dates T e (0, 1] when Ao = X\ = 1. 

R> require(fOptions) 

R> r <- 0.1 
R> sO <- 0.2 
R> si <- 0.4 
R> L0 <- 1 
R> LI <- 1 
R> K <- 110 
R> SO <- 100 
R> dl <- 0 
R> d0 <- 0 
R> pO <- 0.5 

R> tt <- seq(0, 1, length = 50) 

R> pVO <- NULL 
R> pVl <- NULL 
R> pBSO <- NULL 
R> pBSl <- NULL 
R> for (T in tt) { 

+ pVO <- c (pVO, V0 (SO, K, T, r, sO, si, LO, LI, pO)) 

+ pVl <- c (pVl, VI (SO, K, T, r, sO, si, LO, LI, pO)) 

+ pBSO <- c(pBS0, GBSOption(TypeFlag = "c", S = SO, X = K, 

+ Time = T, r = r, b = r, sigma = s0)@price) 

+ pBSl <- c(pBSl, GBSOption(TypeFlag = "c", S = SO, X = K, 

+ Time = T, r - r, b = r, sigma - si) Sprice) 

+ } 

R> matplot(tt, cbindfpVO, pVl, pBSO, pBSl), type - "o", 

1ty = rep(1, 

+ 4), pch = c(0, 1, 15, 16), cex = 0.5, col = rep(l, 4), 


mam 
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+ xlab = expression (T)) 

R> legend(0.1, 12, c(expression(tilde (V) [0]), 
expression(tilde(V)[1]) , 

+ expression(BS[0]), expression(BS[1])), Ity = rep(l, 4), 

pch = c(0, 

+ 1, 15, 16), cex = 0.75, col = rep(l, 4)) 

From Figure 8.3 we notice that the prices under the Markov switching model are 
always between the corresponding Black and Scholes prices. When Xq increases, 
the price Vo moves away from the Black and Scholes price BSq, while when X | 
decreases to zero, Vi converges to BS\ (see Figure 8.4). Finally, in Figure 8.5, 
we show that when both Xq and X \ increase, the two prices Vo and Vi move 
away from the Black and Scholes prices and both converge to a common limit. 

This model and the approximation formula can be generalized to the case of 
a Markov chain a, with state space S = (0. I...., /V [. In this case the pricing 
formula of the call option is given by the following expression: 


Vi =e rT J |<Klog(y + K)\ m(t 0 , h,..., T n ), v(t 0 , h ,..., t n )) 

(1 - e- k ° T S 0 (0 - i )- e~ XNT 8 0 (N - i)) 

+ <p(log(y + K)-, m(T , 0,..., 0), v{T, 0,..., 0))e~ x ° T 8 0 (0 - i) 


+ <p(log(y + K)\ m{ 0, 0,..., T), i>(0, 0,..., T))e~^ NT 8o(N - i) 


dy 



Figure 8.3 Difference between Black and Scholes price (BSo and BS\) and the 
price under the Markov switching diffusion (Vo and V\). Parameters: ao — 0.2, 
crj = 0.4, r — 0.1, do = d\ — 0, So — 100, K — 110 and no — .5 for different 
maturity dates T e (0, 1] when Xq — X\ — 1. 
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where tj = jr, T, i — 0,..., N and 


m(to, h. 


t N ) — log(So) + 



U 5 


N 

v(to, h,...,t N ) = 

;=o 



Figure 8.4 Difference between Black and Scholes price (BSq and BS\) and the 
price under the diffusion Markov switching (Vo and V\). Parameters: ero = 0.2, 
ctj = 0.4, r — 0.1, do — d\ — 0, = 100, K — 110 and no = .5 for different 

maturity dates T e (0, 1] when Xq — 10, A] = 0.1. The price Vo moves up while 
the price V\ converges to BS\. 



Figure 8.5 Difference between Black and Scholes price (BSq and BS\) and the 
price under the diffusion Markov switching (Vo and V\). Parameters: <7q = 0.2, 
o\ — 0.4, r — 0.1, do — d\ — 0, Sq — 100, K — 110 and ttq — -5 for different 
maturity dates T e (0, 1] when Lo = 10, A| = 10. The price Vo moves up and Vi 
moves down and both converge to the same curve. 
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8.3.1 Monte Carlo pricing 


As for the other models, it is possible to price options using the Monte Carlo 
approach. In our case, we make use of the code simMSdiff presented in Section 


4.5.7. 


R> simMarkov <- function (xO, n, x, P) { 

+ mk <- numeric(n + 1) 

+ mk[l ] <- x 0 

+ state <- which(x == x0) 

+ for (i in 1 :n) { 

+ mk[i + 1] <- sample(x, 1, prob = P[state, ]) 

+ state <- which(x == mk[i + 1]) 

+ } 

+ return (ts (mk)) 

+ } 

R> simMSdiff <- functionfxO, aO, S, delta, n, f, g, Q) { 
+ require(msm) 

+ P <- MatrixExp(delta * Q) 

+ alpha <- simMarkov(aO, n, S, P) 

+ x <- numeric(n + 1) 

+ x[l] <- xO 

+ for (i in 1 :n) { 

+ A <- f(x[i], alpha[i]) * delta 

+ B <- g(x[i], alpha[i]) * sqrt(delta) * rnorm(l) 

+ x[i + 1] <- x[i] + A + B 

+ } 

+ ts(x, deltat = delta, start - 0) 

+ } 


After preparing all the code, we now need to specify the infinitesimal generator 
Q, which in the two-state cases is 


—Xq Xq 

7-i —7.J 


and simulate the diffusion with Markov switching under the risk-neutral measure, 
i.e. the process 

d S t = (r -d ai )S,dl +a a ,S t dW, 


where W t is the Brownian motion under the martingale measure. The parameters 
of the model are as follows: T = 0.75, r = 0.05, <jq — 0.2, = 0.4, ko = ki = 

1 , do — d\ — 0, jtq — 0.5 and K = 80, So — 85. 

R> T <- 0.75 
R> r <- 0.05 
R> sO <- 0.2 
R> si <- 0.4 
R> L0 <- 1 
R> LI <- 1 
R> K <- 80 
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R> SO <- 85 
R> dl <- 0 
R> dO <- 0 
R> pO <- 0.5 

R> Q <- matrix(c(-L0, LI, LO, -LI), 2, 2) 

Notice that for the infinitesimal generator we always have jtQ — 0, indeed 

R> c (pO, pO) %*% Q 

[, 1 ] [,2 ] 

[ 1 ,] 0 0 

Now we prepare the drift and diffusion coefficients to pass to the function simMS- 
diff. 

R> f <- functionfx, a) ifelsefa == 0, (r - dO) * x, (r - dl) * x) 
R> g <- functionfx, a) ifelsefa -= 0, sO * x, si * x) 

and we simulate with a time mesh of A = 1 /n with n — 1000. Remember that 
the price of the call option is given by 


Vo — e rr Eg{max(5'7’ — K, 0)}. 


The algorithm is very simple then 

R> n <- 1000 
R> set.seed(123) 

R> nsim <- 1000 

R> STO <- numeric(nsim) 

R> for (i in l:nsim) { 

+ X <- simMSdifffxO = SO, aO = 0, S = 0:1, delta = 1/n, 
n - T * 

+ n, f, g, Q) 

+ STO[i] <- X[length(X)] 

+ } 

R> vO <- exp(-r * T) * pmax(STO - K, 0) 

R> meanfvO) 

[ 1 ] 11.93979 

which we compare with price given by the approximation formula for Vo 

R> meanfvO) 

[ 1 ] 11.93979 

R> VO (SO, K, T, r, sO, si, LO, LI, pO) 


[ 1 ] 11.97513 
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Similarly we proceed with V \. The only difference is the starting state ao — i of 
the Markov chain a t . Therefore 

R> set.seed(123) 

R> ST1 <- numeric(nsim) 

R> for (i in l:nsim) { 

+ X <- simMSdiff(xO = SO, aO = 1, S = 0:1, delta = 1/n, 


n = T * 

n, f, g, Q) 

ST1 [i] <- X[length(X)] 


+ 


+ 


+ } 

R> vl <- exp(-r * t) * pmax (ST1 - K, 0) 

R> mean(vl) 

[1] 13.88842 

R> V1(S0, K, T, r, sO, si, LO, LI, pO) 

[1] 14.39285 

Although the number of replications in the Monte Carlo pricing are small, the 
prices are reasonably close. These algorithms are sufficiently general so one can 
price Markov switching diffusion processes with any number of states but also 
with nonlinear drift and diffusion coefficients /(-,-) and g(-, ■)• 

8.3.2 Semi-Monte Carlo method 

According to Buffington and Elliott (2002), a full Monte Carlo method is not 
always needed if it is possible to generate a continuous Markov Chain a,. Indeed, 
it is possible to use a Black and Scholes formula based on the fact that 



S, — So exp{ X ,} = So exp 


with W r the Brownian motion under the risk-free measure. The call option price 
can be written as 


C — E{e r T max(Sy - K, 0)} = E (E [e rT max(SY — K, 0)| Ft}) 


where {T ,, 0 < t < T) is the a -algebra generated by the Markov chain a t . Now, 
as noticed by Liu et al. (2006), the conditional expectation can be calculated as 
follows: 


E {e~ rT max(S T - K, 0)| JT r } = V (rr “ LT) <I>(r/1) - Ke~ rT <&(d 2) 


dx = 



s/Vt 


where 
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and 



Once the trajectory a T is available, it is possible to calculate L T and Vj directly 
by discretization of the integrals. We construct now the function vsmc which 
gives the semi-Monte Carlo price of the option for a Markov chain a with state 
space s, generator q and initial state aO. This time we use package msm to 
simulate a full path of the continuous time Markov chain instead of using the 
e-skeleton approach. 

R> Vsmc <- function (SO, K, T, r, aO, S, Q, mu, sigma, M = 1000) { 

+ require(msm) 

+ C <- numeric (M) 

+ for (i in 1:M) { 

+ MC <- sim.msm(Q, T, start = which(S == aO)) 

+ alpha <- S[MC$states] 

+ t <- diff(MC$times) 

+ s <- sapply(alpha[-length(alpha)], sigma)*2 

+ m <- sapply(alpha[-length(alpha)], mu) 

+ VT <- sum(t * s) 

+ LT <- sum(t * m) 

+ dl <- (log(SO/K)+ LT + 0.5 * VT)/sqrt(VT) 

+ d2 <- dl - sqrt (VT) 

+ RT <- r * T 

+ C[i] <- SO * exp(-(RT - LT)) * pnorm(dl) - K * 

exp(-RT) * 

+ pnorm(d2) 


} 

mean(C) 


+ 

+ 


+ } 


Now we compare the semi-Monte Carlo price with the approximation formulas 
(8.12) of Vj, i — 0, 1. We use the following parameters: T = 1, r = 0.1, oo = 0.2, 
cti = 0.3, Xg = A.i = 1, do = di = 0, K = 90, Sg — 100 as in the original work 
of Liu et a.1. (2006). 

R> T <- 1 
R> r <- 0.1 
R> sO <- 0.2 
R> si <- 0.3 
R> L0 <- 1 
R> LI <- 1 
R> K <- 90 
R> SO <- 100 
R> dl <- 0 
R> dO <- 0 
R> pO <- 0.5 

R> Q <- matrix(c(-L0, LI, LO, -LI), 2, 2) 

R> sigma <- function(a) ifelse(a == 0, sO, si) 

R> mu <- function(a) ifelse(a == 0, r - dO, r - dl) 
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Now we test the result for an initial sate «o = 0 

R> VsmcfSO, K, T, r, aO = 0, 0:1, Q, mu, sigma) 

[ 1 ] 20.71375 

R> VO (SO, K, T, r, sO, si, L0, LI, pO) 

[ 1 ] 20.81154 

and for an initial state «o = 1 

R> VsmcfSO, K, T, r, aO = 1, 0:1, Q, mu, sigma) 

[ 1 ] 21.81279 

R> VI (SO, K, T, r, sO, si, LO, LI, pO) 

[ 1 ] 21.73914 

both results are close. Notice that this method is much faster than the pure Monte 
Carlo method of previous section, although it is limited to the standard linear 
model, while the full Monte Carlo method works with any Markov switching 
diffusion. 


8.3.3 Pricing with the Fast Fourier Transform 

In the same work of Liu et al. (2006), it is possible to find the explicit formulas 
for the characteristic functions of Vo and V\ which are need to perform option 
pricing using the FFT algorithm. We remind briefly only the formulas and the 
basic idea because pricing by FFT is the same as what has already been treated 
in Section 8.1.5. Let p > 0 denote the dampening factor and k — log( K ). By Carr 
and Madan (1998) we know that the call price formula can be given as 


c(k) = e pk 


C(K) 

So 


and the objective is to derive a formula for the characteristic function of xj/ (u) 
of c(k). The formula is the following: 


E(e rT (t> T {u -/(l + p))) 
p 2 + p — u 2 + i( 1 + 2 p)u 

where 0 r (n) = E {e'" Xr | kF T \ is the conditional expectation of Xj given 
\kF ,, 0 < t < T], the cr-algebra generated by a t and X, as in previous section. 
Xj is a Gaussian random variable with mean Lp — \Vj and variance Vj, 
hence: 


1 


1 
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Putting all terms together we obtain 

_ E (exp {(1+p) (L T -\pV T )-rT-\u 2 V T +iu (, l T + (\ + p ) Vr)}) 
p 2 +p — u 2 +i(\+2p)u 

Taking into account a Markov chain with only two states, it is possible to obtain 
the following explicit formula for i j/(u) 


f(u) = 


exp{B(u)T(f> ao (A(u ), 7Q}+ 
p 2 + p — u 2 + i( 1 + 2 p)u 


where 


A(u) = ((di - d 0 ) + ( - + p ) (ctq - erf) u + ^u 2 (ctq - erf) i 


1 2 

+ | (1 + p)(do - d\) - -p( 1 + p) (ct 0 2 - o'!)~ 


= iu ]/'-£?! + ( - + p ) of 


^n 2 er 2 + (l + p)(r - JO - r+^p(l + p)of 


and 


mo, T) = 
M0,T) = 


1 


H -s 2 

1 


H — S 2 

with si and s 2 the two roots of the equation 


{(si + ko + ki)e s,T — (s ’2 + 3-o + X\)e SlT } 

{(•s'l + Lo + ki — i0)e SlT — (s 2 + Ao + Ai — iO)e slT } 


s + (Ao + Ai — i6)s — iOXi — 0. 


Although it appears complicated, these formulas are explicit and FFT algorithm 
can be applied taking into account (Liu et al. 2006) that 


c(ki) 


n 


1 = 0 , 1 , 


with k{ = (l — y) A,t, l = 0,1,..., N — 1, A u the grid size for variable u, with 
Uj = j A„, j = 0, 1,..., N — 1, A„Aa- = 2jz/N and a set of weights 


w(j) = 


1 

3’ 

j 

= 0, 

4 

3’ 

j 

is odd, 

2 

3’ 

j 

is even 


Finally, apply fft to the sequence e ljn ij/{jA u )w(j), j = 0,..., N — 1. 
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8.3.4 Other applications of Markov switching diffusion models 

Di Masi et al. (1994) considered a model similar to (8.9) where only the volatility 
is affected by Markov switching. The model presented in this section is more 
general in that the drift is also controlled by the underlying Markov process. The 
model in Di Masi et al. (1994) falls in the category of stochastic volatility models 
which are not treated in this book. A good starting point for reading is Shephard 
(2005). Under the diffusion model with Markov switching it is also possible to 
do pricing of other types of options, such as perpetual lookback options, Russian 
options, perpetual American options as shown in Guo (1999) and Buffington and 
Elliott (2002). Optimal stock trading rules under this model have been developed 
in Zhang (2001). Estimation of this model has been considered under different 
situations and approaches. In Elliott et al. (2008) moment type estimation is 
considered while Fearnhead et al. (2008) considered particle filter and Hahn and 
Sass (2009) the Bayesian approach. 

8.4 The benchmark approach 

We have seen so far that the main ingredients of option pricing are the exis¬ 
tence of a martingale measure, which implies non-arbitrage, and the ability to 
hedge risk, which implies market completeness. In some cases, like the Levy 
market, there exist an infinite set of equivalent martingale measures and thus 
there exist more than one fair price depending on which measure the market 
chooses. In other cases, like market governed by the jump telegraph process 
or Markov switching diffusions, there are additional sources of noise so that 
completeness is ensured only after the addition of special additional conditions 
like the introduction of COS securities. But in all models, the pricing requires 
the existence of a martingale measure, which is more a technical tool than an 
economic need. Other approaches exist like the so-called benchmark approach 
by Platen and David (2006) and Platen and Bruti-Liberati (2010). We cannot go 
into too much detail, but it is at least worth mentioning the basic facts of this 
approach. The main object in this theory is not the martingale measure but the 
benchmark. The benchmark has the function of numeraire with respect to which 
all derivates and other financial products should be priced. All pricing occur 
under the real measure P and not under a martingale measure Q and this links 
more closely the pricing task with the part of the theory related to estimation 
of the model’s parameters from historical data. The key ingredient is the growth 
optimal portfolio GOP which plays the role of the reference unit or numeraire 
in this market. The GOP is a portfolio that maximizes the expected logarithmic 
utility from the terminal wealth, see Kelly (1956) and Long (1990). The idea is 
to target the market on a long run period. With this in mind, the search is for 
a strictly positive process, like a market index, which when used as numeraire 
or benchmark, generates benchmarked price processes that become martingales 
under the physical measure. This implies that the benchmarked derivative process 
represents the best forecast of their future benchmarked values. Let us now see 
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what happens in the standard Black and Scholes market under this approach. For 
this market we already know how to derive the fair (non-arbitrage) price under 
the equivalent martingale measure. As usual, we take the geometric Brownian 
motion for the asset price dynamics 

d5, = /r5 r df + aStdB t 

with some initial condition 5b > 0, and a > 0. For the savings account nonrisky 
process we consider 

d R, — rR,dt, Rq = 1. 

We denote the self-financing strategy as 8(a t ,b t ), t >0 where a, is the unit of 
S, invested at time t and b, the units invested in savings account. The value of 
the portfolio under this strategy is 


V, = a t S, + b,R, 


with the the self-financing structure 

dV t s = a,dS, + b,dR, 

— a t ixStdt + a, aS t dB, + b,rR t dt 

— ( b,rR, + a,fiS t )dt + a t crS,dB t 
— V t s (ji®r + 7r//r)d? + njodB t 

where n®(t) and irj (t) are the fractions that are held in the respective securites 

R S 

Xs(t) = b tys, ^sO) = a t ^s 

and + re- (0 = 1, for t e [0, T\. Clearly the fractions makes sense as long 
as the value of the portfolio Vf is not zero. The growth optimal portfolio (GOP) is 
the portfolio that maximizes the drift of the logarithm of the value of the portfolio 
for any time horizon. By Ito formula we can write the stochastic differential 
equation for log V t s which reads as 

dlog V t s = gfdl + nl{t)adB, 


with growth rate 

g s t = r + nl(t)(ji - r) - - (^(t))“ cr 2 , t e [07’]. 

Definition 8.4.1 Under the Black and Scholes model, the GOP is the portfolio 
V t s * — {V/*, 0 < t < T] with optimal growth rate gf* at time t such that 
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almost surely for all t € [0, T] and all strictly positive portfolio processes V t s . 

For this model is easy to obtain the optimal growth rate, which is indeed the 
portfolio with optimal fraction 7Tg(t). We apply simple derivation to find the 
maximum 

9 

| , ' gf = M - ml(t)a 2 = 0 ,t e [0, T], 

3 ttg (0 

Therefore, the optimal fraction is given by 


» i *(0 = 




<T Z 


and 


71 


<5*V 


(0 = 1- nL (0, t e [0, T], 


Finally, the optimal growth rate is given by the formula 

,sio ' n 

Now we can replace this expression in the stochastic differential equation of dV t s 
and obtain 

dV t s * = V t s * ((r + 6> r 2 )d t + 9 t dB t ), S s 0 * > 0 
and GOP volatility 


0 t = Xs*(t)cr = ——f€[0J]. 
a 

The quantity 9 t is called the market price of risk at time t. Now, given all the 
above, the optimal growth rate for the Black and Scholes model is 

gf* = r + l -9f, t e [0, T], 

We now introduce the discounted GOP 


yS* = 


yS* 


Rt 


which, again using ltd formula, is the solution of the following stochastic differ¬ 
ential equation 


dV, s *9,(9tdt+dB t ), t e [0, T], 


Notice that for this model the drift is determined by the square of its diffusion 
coefficient. 
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8.4.1 Benchmarking of the savings account 

We now benchmark the saving account and the risky asset. The benchmarked 
savings account = {R®, 0 < t < T] where 



A 

V 8*’ 


t e [0, T], 


Again, by Ito formula we get 


d = -9,R°dB t . 


This process is clearly a martingale because its solution is just an Ito integral. 
So the first effect we notice is that benchmarking the savings account returns a 
martingale under the physical measure. 

8.4.2 Benchmarking of the risky asset 

We now perform benchmarking of the underlying asset price process: 

$ = Jb. ^[0,7]. 

Again, by Ito formula and recalling that 9, — (fj. — r)/a), we get: 

d S] = S/O - r - a9,)dt + Sj (a - 6 t )dB, 

= s](p -6 t )dB t 


which is again a martingale under the physical measure. 


8.4.3 Benchmarking the option price 

Now, let P, — P(t, S,) be the price of, e.g., a European option. We can introduce 
the benchmarked price 

P p t 

1 y*** 1 


where P{t, S t ) — P(t, S t )/R, and S, — S t /R t . Therefore, the benchmarked pay¬ 
off has the form 


Pt = 


H(S t ) 
V t s * 


We now apply Ito formula to derive an expression for the stochastic differential 
equation of P,. 
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d P(t, S t ) = 


- P(t,S t ) + Qi- 

ot 


r)Sr-^P(t, Sr)+\a 2 P{t, S t ) 2 -^_P(t, 5,)) del 


+ crP(f,S t )-z-P(!,St)dB t 

OX 

= -z~P(t, S t )S t ((/r — r)dl + crdB t ). 
dx 

Similar calculations lead to the following stochastic differential equation for P, 

d P, = R° (aS t ^-P(t, St) - 9 t P(t, S,)j d B„ t e [0, T]. 

Notice that the benchmarked option price has no drift teiTn. 

In summary, the GOP has the property that, if used as numeraire or bench¬ 
mark, transforms all related processes into martingale. For this reason the GOP 
is called in the literature the numeraire portfolio, see Long (1990). Now we can 
introduce the notion of fair price under this setup. 


Definition 8.4.2 A price process {P,. 0 < / < T\ is called fair if its benchmarked 
value P t — P t / V t s * is a martingale under the physical measure. 


Therefore, the following price formula is available 


P, = V/*E 



where T — {T t , 0 < / < T} is the filtration under the real work measure P. 


8.4.4 Martingale representation of the option price process 

The stochastic differential equation of the benchmarked option price process P, 
can be rewritten in this integral form: 

PT = P '[ (°W*^ P(U ' Pu) ~ ^ 

and then the benchmarked payoff can be written also in this way 


Pt = 


H(S t ) 

V$* 


Since P is a martingale, we have that 
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now multiplying both sides of the above equation by Vf* we get 
P, = V t s *P, = V/*E {P T \Ft) 


therefore 


P, = V/*E 


H(S t ) 

V$* 



t e [0, T], 


And this is a pricing formula ready to be used once the growth optimal portfolio 
(GOP) strategy V t s * is available. Platen and David (2006) show how to deter¬ 
mine the GOP in practice and extend the above result to a variety of financial 
applications (not just option pricing) and also under jump models. 


8.5 Bibliographical notes 

There is a vast literature on pricing models out of the Black and Scholes method. 
In this chapter we presented few examples where some of them are more used 
than others in practice like the exponential Levy market model. In particular, 
for this model one should mention at least the following references: Schoutens 
(2003), Cont and Tankov (2004), Joundeau et al. (2007), Di Nunno et al. (2009) 
and Platen and Bruti-Liberati (2010). Models based on the telegraph process 
have been studied in Di Masi et al. (1994), Di Crescenzo and Pellerey (2002), 
Ratanov (2005a,b, 2007a,b). For the Markov switching diffusion process one 
can start from the following papers Guo (1999), Buffington and Elliott (2002), 
Zhang (2001) and references there in. The benchmark approach is mostly treated 
in Platen and David (2006) and Platen and Bruti-Liberati (2010). 
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9 


Miscellanea 


9.1 Monitoring of the volatility 

We have seen that volatility of the market or of the asset prices plays a crucial 
role in many aspects. One of the underlying assumptions in the standard Black 
and Scholes market of Chapter 6 is that volatility is supposed to be constant. We 
have seen many examples (see, e.g. Section 6.6) about the fact that this assump¬ 
tion is simply unrealistic when we go to analyze real financial data. We have 
also seen deviations from the standard geometric Brownian motion model which 
allow for nonconstant volatility. Change point analysis was initially introduced in 
the framework of independent and identically distributed data by these authors: 
Hinkley (1971), Csorgo and Horvath (1997), Inclan and Tiao (1994), Bai (1994, 
1997) and quickly applied to the analysis of time series: Kim etal. (2000), Lee 
et al. (2000), Chen et al. (2005). For continuous time diffusion models Kutoyants 
(1994, 2004) and Lee et al. (2006) studied change point in the drift term from 
continuous time observations. Due to the fact that volatility can be estimated 
without error in continuous time, the change point analysis in this setup is not 
very interesting. Recently, De Gregorio and Iacus (2008) considered least squares 
estimation for the volatility of a one-dimensional stochastic differential equation 
and later Iacus and Yoshida (2009) consider the problem under the general setup 
of multidimensional Ito processes also observed in discrete time. Although we 
will only consider the problem of change point estimation, we mention that 
Song and Lee (2009) proposed the CUSUM test statistics to discover structural 
change points for one-dimensional diffusion processes. We briefly recall both 
least squares and maximum likelihood estimation approach. 
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9.1.1 The least squares approach 

In Section 6.6 we have used, without details, the cpoint function to discover 
change points in the volatility of geometric Brownian motion. We now describe 
the precise setup. Let A — ( X,. 0 < t < T] be a diffusion process, with state 
space PL = (/, r), —oo ^ l < r < +oo, such that 

x = f *o + fo b(X s )ds + /o Vficr(X s )dB s , 0 < f < r*, 

' [X r *+ f‘ t b(X s )ds + f‘ t *JW 2 a (A,. )d B ,, r * <t<T, 

with Xq — xq, 0 < 0i, 0 2 < oo and {B t , t > 0} a standard Brownian motion. The 
value r* e (0, T) is the change point instant. The parameters 6\ and 0 2 belong 
to 0, a compact set of M + . The coefficients b : X —» M and cr : X —> (0, oo) 
are supposed to be known, continuous with continuous derivatives and regular 
so that (9.1) is well defined and the process X is unique and such that the 
process possesses the ergodic property. Let s(x) — exp{— f^2b(u)/cx 2 (u)du} 
be the scale function (where xq is an arbitrary point inside A). The following 
condition will be required throughout this section: 

r* r* 2 

lim / s(u)du — -poo, lim / s(u)du = + 00 , (9.2) 

xi^Uxi x ^ r Jx 

where l < x\ < x < x 2 <r. Condition (9.2) guarantees that the exit time from X 
is infinite (Karatzas and Shreve 1991). We later assume that b(-) is unknown and 
we will make use of nonparametric estimators. The process X is supposed to be 
observed at n + 1 equidistant discrete times 0 = to < t\ < ... < t„ = T, with t, — 
i A„. For the sake of simplicity we assume T — I and with little abuse of notation, 
we will write A, instead of X tj and B, instead of B, j . The asymptotic framework 
is a high frequency scheme: n —> oo, A„ —»■ 0 with n A„ = T with T fixed. Given 
the observations A), i = 0, 1 ,..., n, the aim of this work is to estimate the change 
time r* as well as the two parameters 6 ], 0 2 . The solution of this problem is an 
adaptation of the least squares approach of Bai (1994) for autoregressive models. 
By Euler scheme, we first construct the standardized residuals 

A, +1 — Xj — b(Xj)A n . 

Zj — - -= -, i = l,...,n. 

Va>(a ,.) 

Let us denote by ko — [«r*] and k = [nr], r, z* e (0, 1), where [xj is the 
integer part of the real value x. The least squares estimator of the change point 
is given by 

k Q = argnfin (min | J^(Z, 2 - 6»i) 2 + ^ (Z? - 0 2 ) 2 

k V 1 ’ 02 l i=l i=k +1 

= argnfin {^(Z 2 - 6 X ) 2 + (Zf - d 2 ) 2 

li=l i=k +1 


(9.3) 
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S k 


where 

i= 1 " " i=k +1 

It is easy to show that the problem (9.3) is equivalent to the following 


and 87 
k n — k 


1 " 

_ 1 2 2 _ 

n — l- Z—i 1 n — k 


k 0 = arg max | D k \ 


(9.4) 


where 


D k 


k_S, 

n S n 


Once ko has been obtained, the following estimator of the parameters 0\ and 82 
can be used: 



§2 = 


n-kp 
n — ko 


Denote by VV(v) the two-sided Brownian motion , i.e. 


W(n) 


| Bi(—u), u < 0 
I B 2 (u), u > 0 


(9.5) 


where B\(t), B 2 (0> t > 0, are two independent Brownian motions. De Gregorio 
and Iacus (2008) show the following asymptotic results. Let d n — \&2 — Qi I 7 ^ 0 


for finite n. Under the additional condition that & n —> 0 and 


change point estimator f* = ko/ n is also consistent and such that 


ntfit* - t *) ^ 
28 2 


arg max \ W(v) -— 

v | 2 


00 , the 


(9.6) 


for any consistent estimator 8 for the common limiting value 8 q of 8 \ and 9 2 . 
The condition d n —> 0 corresponds to the setup of contiguous alternatives, i.e. 
the two parameters 8 \ — 8 \ (n) and 87 = 82 (a) are allowed to be closer and closer 
as the sample size increases. Thus, in order to discriminate the two regimes a suf¬ 
ficiently large number of observations n (or rate of convergence t)„) is required. 
Under the conditions above, the estimators 8 \ , 82 are *Jn- -consistent and such that 


\fn 


8j - 81 

02 — 82 


^N( 0,2), 


where £ = 


2 $ 0 \ 

0 24,) 


The above results hold in the high frequency case A„ —»■ 0 with n —> 00 and 
nA — T fixed, but also for the rapidly increasing design case, i.e. nA 2 —> 0, 
nA„ — T —> 00 . 
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In most cases, the drift coefficient is not known or treated as a nuisance term 
in the statistical model. If we assume T -* oo, then it is possible to estimate 
consistently the drift coefficient, see, e.g. Iacus (2008). The cpoint function in 
the sde package implements the nonparametric drift estimator proposed in Bandi 
and Phillips (2003) although similar alternative nonparametric estimators can 
be used in this setup. The reader might want to consider the results in earlier 
papers by Pham (1981), Florens-Zmirou (1993) or Stanton (1997). Let K ^ 0 
be a kernel function, i.e. K is symmetric and continuously differentiable, with 
f R uK{u)du = 0, f m K 2 {u)du < oo and such that / R K(u)du = 1. We plug into 
the object function (9.3) these new standardized residuals 


X i+l -Xi-B(Xi) A„ 

V 


where 



b(x ) 


is our nonparametric estimator of the drift. Then, the change point estimator is 
obtained by maximizing D\ as in the case of known drift. This mixed result of 
parametric and nonparametric estimation is quite useful in applications because 
in practice there is no need to fully specify the data generating model for the 
observed data. The change point analysis for this reduced model identifies a 
change in the scale (or intensity) in the volatility levels. De Gregorio and Iacus 
(2008) also discuss the choice of the bandwidth selection problem for h„ which 
is implemented in the cpoint function. 

9.1.2 Analysis of multiple change points 

Usually, change point statistics identify the largest change point, but it may be 
that other change points exist in the series. One possibility to solve this problem 
is that once the large change point has been detected, the time series is split into 
two subseries and the analysis of structural breaks is pursued on the subseries 
separately. The next code analyzes the volatility of the AAPL stock in this way. 
We first discover the largest change point 

R> require(sde) 

R> require(fImport) 

R> Delta <- 1/252 

R> S <- yahooSeries("AAPL", from = "2009-01-01", 
to = "2009-12-31") 

R> Close <- S[, "AAPL.Close"] 

R> sqrt(var(returns(Close))/Delta) 


AAPL.Close 
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AAPL.Close 0.3327552 

R> Close <- rev(Close) 

R> cp <- cpoint(as.ts(Close)) 

R> cp 

$k0 

[ 1 ] 201 

$tau0 
[ 1 ] 201 

$thetal 
[1] 2.510733 

$theta2 
[1] 3.486225 

then we split the time series in two parts and evaluate the volatility individually 
assuming that the underlying model is the geometric Brownian motion: 

dX r = fiX t dt + aX,dB, 

thus we do not consider the estimates of 6\ and 6A provided by the software for 
the model 

AX, = b(X,)dt + 6AB,. 


R> Closel <- Close[time(Close)[1:cp$k0], ] 

R> Close2 <- Close[time(Close)[-(1:cp$k0)], ] 
R> sqrt(var(returns(Closel))/Delta) 

AAPL.Close 
AAPL.Close 0.3442099 

R> sqrt(var(returns(Close2))/Delta) 

AAPL.Close 
AAPL.Close 0.2704417 

Now we repeat the analysis on left-hand time series: 

R> cp2 <- cpoint(as.ts(Closel)) 

R> cp2 

$k0 

[1] 99 

$tau0 
[1] 99 


$thetal 
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[1] 2.756492 

$theta2 
[1] 2.240476 

and we plot the change points values against the plot of the returns of the asset 
value in Figure 9.1. 

R> plot(returns(Close), theme = "white", ylab = "AAPL Returns", 

+ main = " xlab = "") 

R> ablinefv = time(Close)[cp$kO], lty = 3) 

R> abline(v = time (Closel)[cp2$k0], lty = 3) 


9.1.3 An example of real-time analysis 

The change point statistic presented is a retrospective tool, i.e. once the data are 
collected we look retrospectively to the change point instant. This is very useful to 
calibrate financial models in the sense that, instead of taking the whole trajectory 
of an asset price, this tool permits us to extract only the last relevant observations 
with more homogeneous volatility. In other applications, it is instead interesting 
to use change point statistics to do real time monitoring. Figure 9.2 contains the 
results of a real time change point analysis of the last recent financial crisis. By 
real time it is meant that the analysis proceeds as follows: only data up to the last 
week of June 2008 are considered and change points are estimated; then data are 
increased by one week and again a new change point analysis is performed; then 
the third week is added, and so forth. As soon as change points were found, they 
were kept (and added to the plot in Figure 9.2). The ‘official’ week of the total 
collapse is represented in a different color. The experiment in Smaldone (2009) 



Figure 9.1 First two change points for the AAPL time series. 
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Time 

f June 30^ 

July 7 

V July 14^ 

'August 18^ [August 25^ 

f Sept. 1 ^ 

( Sept. 8 ) 

'Sept. 15^ 

f Sect. 22^(^B 

^ July 4 J 

July 11 

Julyl J 

[August22j [August29j 

[^ Sept. 5 J 

(Sept.12j 

^Sept. 1 9j 

^Sept. 26 

i 

Lehman 



1 1 

DJ Stoxx Global FTSE 

Lehman 

i 

DJ Stoxx 

Goldman 

i 

Deutsche Bank 


Brothers 



1800 Banks 

DJ Stoxx 600 

Brothers 

600 Banks 

Sachs 

HSBC 




Dj Stoxx Asia Pacific 
600 Banks 
JP Morgan Chase 


DJ Stoxx America 600 Banks s&P MIB 

DJ Stoxx 600 Banks Nikkei 225 

Deutsche Bank 

HBSC 

Barclays 

Deutsche Bank (GER) 

CAC 


Nyse 

Dow Jones 
S&P 500 
FTSE 
DAX 
S&P MIB 
CAC 
IBEX 
SMI 

Nikkei 225 
DJ Stoxx 600 Banks 


DJ Stoxx Global 1800 

MCSI World 

Morgan Stanley 

Bank of America 

Barclays 

RBS 

Unicredit 

Intesa Sanpaolo 

Deutsche Bank (GER) 

Commerzbank 


Figure 9.2 Summary of a crude real-time structural change point analysis of 
several financial markets and assets. Change-point analysis seems to be able 
to capture structural breakpoints not only retrospectively but also in real time. 
Source: Smaldone (2009). 


involved the change point analysis of several market indexes including Nyse, 
Dow Jones, S&P 500 and Nasdaq; Dow Jones Stoxx 600, Nikkei, Dow Jones 
Global 1800, MSCI World, FTSE (UK), DAX (Germany), S&P/Mib (Italy), CAC 
(France), Ibex (Spain) and SMI (Switzerland). Then several indexes for the bank 
compartment like Dow Jones Stoxx Global 1800 Banks (worldwide), Dow Jones 
Stoxx Americas 600 Banks (USA), Dow Jones Stoxx Asia-Pacibc 600 Banks 
(Asia), Dow Jones Stoxx 600 Banks (Europe). Further, a bunch of individual 
stock prices from the USA, UK, Italy, France, Germany, Spain and Japan stock 
exchanges. The evidence from this analysis is interesting in showing how the 
crisis (measure here by uncertainty) affects the different markets at different 
dates (e.g. more protected markets, like the Italian case, reacts slowly to the 
crisis). Although these real-time approach can’t be considered to be reliable in 
general, our experience says that real time monitoring by change point tools is 
worth considering, possibly along with the monitoring of other indexes. 

9.1.4 More general quasi maximum likelihood approach 

Iacus and Yoshida (2009) extended the result on change point analysis for the 
volatility to general ^/-dimensional Ito processes described by stochastic differ¬ 
ential equations of this form: 

dY t = b,dt + a(X t ,0)dB t , t e [0, T], (9.7) 

where B, is an r-dimensional standard Brownian motion, b, and X, are vector 
valued processes, and a(x,6) is a matrix valued function. The parameter 6 
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belongs to 0 which is a bounded domain in R d °, do > 1. As in the above, it is 
assumed that there is a time r* across which the diffusion coefficient changes 
from er(x, 0\) to cr(x, 6L)- More precisely, (Y. X) satisfy the following stochastic 
integral equation: 

Y = ( Yq + ./o M* + fo cr(X s , 0, )dfl, for t e [0, t*) 

' i Y r *+ f‘. b s ds + f‘* a(X s , 0 2 )dB s for t e [r*, T], 

The change point r* e (0, T) is unknown and is to be estimated from the 
observations sampled from the path of (X. Y). As before, X denotes the state 
space of X. The coefficient a(x, 9) is assumed to be known up to the parameter 9, 
while b, is completely unknown and unobservable, therefore possibly depending 
on 0 and z*. In this framework the interest is purely on t* and the estimation 
of 9 is secondary. We assume that consistent estimators exist for 9k and later 
describe how to obtain it. The results are not affected by this assumption. Note 
that diffusion models are included in this framework by simply taking Y — X in 
Equation (9.7). The sample consists of (X ti , Y tj ) , i — 0, 1,..., n, where f; = i A„ 
for A = A n = T /n. The time T is fixed, so this scheme is purely high frequency 
and asymptotic results are of mixed normal type because the asymptotic Fisher 
information for the model will be a random matrix. Denote by 0* the true value 
of 9k for k — 1,2 and r) n = \9* — 9 %|. In order to obtain the asymptotic results 
some conditions should be further assumed on the regularity of the coefficient 
a and on the continuity of the trajectories of the coordinate process X and the 
behaviour of the process b t , i.e. the process itself cannot be too irregular in order 
to discover changes in the structure. The process b t can have jumps but must be 
controlled to avoid that those jumps are interpreted as change points. The condi¬ 
tions to express the above remarks on the regularity of the processes involved are 
rather technical and we do not present them here but the reader can refer to the 
original paper. The only condition which is necessary to explicit concerns, as in 
the one dimensional case of Section 9.1.1, the contiguity of the two parameters 
9\ and 6b. In particular, we assume that 9* and depend on n, and as n oc, 

0j->fl o *6 0, 9 n -* 0, and nrf-> o o. (9.8) 

We need some additional notation. For a matrix A, we denote by A T the transpose 
of A, Tr(A) the trace of A, A® 2 = A ■ A T and by A -1 the inverse of A. Let 
S(x,9) = a(x,9)® 2 . 

9.1.5 Construction of the quasi-MLE 

We now explain how to construct the quasi-MLE (i.e., quasi maximum likelihood 
estimator) of the change point. Let A, Y — Y t . — Y ti ] and define 

[nt/T] n 

4>n(r,9 u 9 2 )= GiVi)+ E G ' ,(02) ’ 

i = 1 i=\nt/T ]+1 


(9.9) 
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with 


G, (9) = log del S(X tl _ t , 6) + A“ 1 (A I T) , »S'(X fi _ 1 ,6 )~ l (A,- Y ). (9.10) 

The contrast function in (9.9) is a version of the one in Genon-Catalot and Jacod 
(1993), Suppose that there exists an estimator for each 9^, k — 1,2. In case 
0 A * are known, we define §k just as 9k = 9j*. The change point estimator of t* is 

T; = arg min 9 U § 2 )- 

t£[0,T] 

The estimator f„ has the same structure of the estimator ko in Equation (9.3). It 
is possible to prove that f„ is a consistent for r* and the rate of convergence is 
of order n&%. Now, let us assume that the following limit random variable 

v = lim (o; - 

n—> oo 

exists. Let E be the positive-definite matrix 

S (x,9) = ( , Dr((a eOl )S)S- 1 (3 flO2 )S)S- 1 )(*,0) > ) ° ,9 = ( 9 (i) ). 

Dehne further, 

H(w) = -2^r| W(u)-ir>i) 

for F,j = (27’) 1 ?/S(A T *, 9q)i), where W is the multidimensional version of the 
two-sided Brownian motion in (9.5) and independent of X T *. Then, 

n9„(t n - t*)->^ argmin H(i>) 

ueM 

as n —> oo. This result is equivalent to (9.6) although in the present case, the 
double-sided Brownian motion W is premultiplied by the random Fisher informa¬ 
tion T, ; , so this result involving mixed normal limit is under stable convergence. 
In order to use it in practice a studentization procedure is required, i.e. the quan¬ 
tity n&%(t„ — r*) has to be normalized by the inverse of r ;J evaluated at the 
change point estimator f„. In this case, the limit above is more similar to the 
one of (9.6). The joint convergence of the normalized i„ and X r * is indeed also 
proved in Iacus and Yoshida (2009). 

9.1.6 A modified quasi-MLE 

The quasi-likelihood approach does not consider the drift term in the estimation. 
For small sample sizes it is possible to construct a new contrast function 

[nt/2T] n 

K(f,o u 9 2 )= g;(0j)+ G 'i( e a)- 

/=i ;=[nt/2r]+i 


(9.11) 
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where the term A,T in G, (0) of (9.10) is replaced by 


A Yj = 


Y 2i+ i - 2Y 2i + Y 2i -1 

V2 


and 


G\(d) = logdet S{X t2i _ l ,6) + A” 1 (A; Y)'S(X t2i l , 0) _1 (A,T). (9.12) 

The change-point estimator has the same asymptotic properties of the one in pre¬ 
vious section but it has better properties for finite samples because it compensates 
for the unknown drift. Indeed, the use of A, Y has the effect of compensating for 
the unknown drift and eliminates the initial condition Yq. Indeed, the term A,T 
reduces to 

( f 2i+l hi \ / hi+l hi 

f b s ds- / Ms + / a(X s ,0)dW s - f cr(X s , 6)dW s 

^ _ Li-i / \ hi _ hi -i_ 

V2 


9.1.7 First- and second-stage estimators 

In the previous construction, it is supposed that 9\ and 0 2 are known and it was 
anticipated that the results still hold if consistent estimators exist. Indeed, it is 
possible to construct consistent estimators of 0 k ,k = 1,2, using the data over time 
interval [0, a n ] lor k — I and the one over | T — a„, T] for k — 2, respectively, 
for some sequence a n tending to zero. Although the details can be found in the 
original paper, the intuition is that, in order to have consistent estimators, the 
sequence a n must be such that na„ -> oo because this will guarantee that we 
have a sufficient number of observations to get the asymptotic result to work. It 
is also possible to consider two separate sequences a\ —> 0 for the estimation of 
6 1 and —> 0 for the estimation of 0 2 . The Quasi-MLE estimators are obtained 
using the contrast function 

\na n ] n 

6 hn = argmin V G, (6»j) and $ 2 ,« = argmin V' G, (0 2 ) 

0 1 ‘ J 0i ‘ J 

i= 1 i=[n(\-a n )] 

Once this first-stage estimators are obtained, it is possible to construct the first- 
stage estimator r n of the change point as follows: 

i n = argmin <T„(r; § hn , 0 2 ,„). 

It is now possible to iterate the procedure to obtain second-stage estimators. 
Indeed, with the initial first-stage change point estimator i n we can estimate 0\ 
and 0 2 more accurately as follows: 

[ ntn] n 

01 ,argmin V' G,-^) and 0 2 ,„ = argmin V] G,(0 2 ) 

0 1 ‘ J do ' J 

i = l i=[nx n ]+\ 
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and finally obtain the second-stage estimator of r, i.e. 

T n = argmin 0i.„, 0 2 ,n)- 

The same procedure is possible with the modified contrast function O' in (9.11). 


9.1.8 Numerical example 

The change point statistics based on the contrast function (9.9) is implemented in 
package yuima via the function CPoint. We now show an example of use with 
the two-steps approach. Let us consider the two-dimensional diffusion process 
X, — ( Xj , Xj) solution to the stochastic differential equation 


dXj 

dXj 


Xj 


d t -}- 


0i.i -Xj 0 ■ Xj 
0 ■ Xj 01.2 ■ Xj 


dWj \ 
d Wf ) 


X 1 (0) = 1, X 2 (0) = 1. 

We first describe this model in R using the yuima package 

R> library(yuima) 

R> diff.matrix <- matrix(c("thetal.l*xl", "0*x2", "0*xl", 

"thetal.2*x2") , 

+ 2 , 2 ) 

R> drift.c <- c("1-xl", "3-x2") 

R> drift.matrix <- matrix(drift.c, 2, 1) 

R> ymodel <- setModel(drift = drift.matrix, diffusion = 
di ff.matrix, 

+ time.variable = "t", state.variable = c("xl”, "x2"), 

solve.variable = c("xl", 

+ "x2")) 


R> require(yuima) 

R> diff.matrix <- matrix(c("thetal.l*xl", "0*x2", "0*xl", 

"thetal.2*x2"), 

+ 2 , 2 ) 

R> drift.c <- c("1-xl", "3-x2") 

R> drift.matrix <- matrix(drift.c, 2, 1) 

R> ymodel <- setModel(drift = drift.matrix, diffusion 
= diff.matrix, 

+ time.variable = "t", state.variable = c("xl", "x2"), 

solve.variable = c("xl", 

+ "x2")) 

and then simulate two trajectories. One up to the change point r — 4 with param¬ 
eters 0i.i = 0.1 and 0i.i = 0.2, and a second trajectory with parameters (9 U = 0.6 
and 0i.2 = 0.6. For the second trajectory, the initial value is set to the last value 
of the first trajectory. 
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R> n <- 1000 
R> set.seed(123) 

R> t1 <- list(thetal.1 - 0.1, thetal.2 = 0.2) 

R> 12 <- list (thetal.1 = 0.6, thetal.2 = 0.6) 

R> tau <- 0.4 

R> ysampl <- setSampling(n = tau * n. Initial = 0, delta = 0.01) 
YUIMA: 'Terminal' (re)defined. 

R> yuimal <- setYuima(model = ymodel, sampling - ysampl) 

R> yuimal <- simulate(yuimal, xinit = c(l, 1), 
true.parameter = t1) 

R> x1 <- yuimal@data@zoo.data[[1]] 

R> x1 <- as.numeric(xl[length(xl)]) 

R> x2 <- yuimal@data@zoo.data[[2]] 

R> x2 <- as.numeric(x2[length(x2)]) 

R> ysamp2 <- setSampling(Initial = n * tau * 0.01, n = n * (1 - 
+ tau), delta = 0.01) 

YUIMA: 'Terminal' (re)defined. 

R> yuima2 <- setYuima(model = ymodel, sampling - ysamp2) 

R> yuima2 <- simulate(yuima2, xinit = c(xl, x2), 
true.parameter = t 2) 

R> yuima <- yuimal 

R> yuima@data@zoo.data[[1]] <- c(yuimal@data@zoo.data[[1]], 
yuima2@data@zoo.data[[1]][-1] ) 

R> yuima@data@zoo.data [ [2]] <- c(yuimal@data@zoo.data[[2]], 
yuima2@data@zoo.data[[2]][-1]) 

The composed trajectory is visible in Figure 9.3. We first test the ability of the 
change point estimator to identify r when for given true values 

R> t.est <- CPoint(yuima, paraml = t1, param2 = t2, plot = TRUE) 

R> t.est$tau 


[1] 3.99 



0 


2 


4 


6 


8 


10 


Figure 9.3 An example of two-dimensional trajectory with change point at 
x = 4. 
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Now we proceed with a two-stage estimation approach. We first estimate the 
parameters 6 \,\ and 0 \ 2 before and after the change point r using the observations 
in the tails 

R> low <- list (thetal.1 = 0, thetal.2 = 0) 

R> tmpl <- qmleL(yuima, start - list (thetal.1 = 0.3, 
thetal.2 = 0.5), 

+ t = 1.5, lower = low, method = "L-BFGS-B") 

R> tmp2 <- qmleR(yuima, start = list (thetal. 1 = 0.3, 
thetal.2 = 0.5), 

+ t = 8.5, lower = low, method = "L-BFGS-B") 

R> coef(tmpl) 

thetal.1 thetal.2 
0.0946981 0.1913319 

R> coef(tmp2) 

thetal.1 thetal.2 
0.7429437 0.8549956 

and obtain the first-stage change point estimator 

R> t.estl <- CPoint(yuima, paraml 

= coef(tmpl), param2 - coef(tmp2)) 

R> t.estl$tau 

[1] 3.99 

With this first change point estimator, we estimate again the parameters in the 
diffusion matrix 

R> tmpll <- qmleL(yuima, start = as.list(coef(tmpl)), 
t - t.estl$tau - 

+ 0.1, lower = low, method = "L-BFGS-B") 

YUIMA: attempting to coerce 'grid' to a list, unexpected 
results may occur! 

R> coef(tmpll) 

thetal.1 thetal.2 
0.09653777 0.20041300 

R> tmp21 <- qmleRfyuima, start = as.list(coef(tmp2)), 
t = t.estl$tau + 

+ 0.1, lower = low, method = "L-BFGS-B") 

YUIMA: attempting to coerce 'grid' to a list, unexpected 
results may occur! 
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R> coef(tmp21) 

thetal.1 thetal.2 
0.7829717 0.7924929 

and finally calculate the second-stage estimator of the change point using the 
second-stage estimators of the parameters 

R> t.est2 <- CPoint(yuima, paraml = 
coef(tmpll), param2 - coef(tmp21)) 

R> t.est2$tau 

[1] 3.99 

The same analysis can be performed using the modified constrast function (9.12) 
in yuima using the option symmetrized = true in the function CPoint. 


9.2 Asynchronous covariation estimation 

Suppose that two Ito processes are observed only at discrete times in an asyn¬ 
chronous manner as usually happens in practice with high frequency data. We 
are interested in estimating the covariance of the two processes accurately in 
such a situation. Let T e (0, oo) be a terminal time for possible observations. 
We consider a two-dimensional Ito process (X 1 . X 2 ) satisfying the stochastic 
differential equations 


dX' = n\At + or/dW/, t e [0, T] 

4 = 4 

for i — 1,2. Here W 1 denote standard Wiener processes with a progressively mea¬ 
surable correlation process d {Wi, HT), = Ptdt, n\ and o l t are progressively mea¬ 
surable processes, and x' 0 are initial random variables independent of (W 1 , IT 2 ). 
Estimation of covariation between two diffusion processes is our goal, but the 
formulation in the above terms allows for more sophisticated stochastic structures 
and the theory presented in this section has been developed in a series of papers 
by Hayashi and Yoshida (2005, 2008) and still under development to handle 
microstructure noise issues. 

The process X 1 is supposed to be observed over the increasing sequence 
of times T lk (k — 0, 1,...) starting at 0, up to time T. Thus, the observable 
quantities are (T' k , X'- k ) with T ik < T. Each T lk is allowed to be a stopping 
time, so possibly depends on the history of (X 1 , X 2 ) as well as the precedent 
stopping times. Two sequences of stopping times T lk and T lj are asynchronous , 
and irregularly spaced, in general. Figure 9.4 shows a typical example of irregular 
sampling where the intervals I k and J J are the intervals determined respectively 
by subsequent elements of the sequences of random times T lk and T 2j . 
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Figure 9.4 Example of asynchronous random sampling for a two-dimensional 
ltd process. 


The parameter of interest is the quadratic covariation between X 1 and X 2 
defined as follows: 

6 = (X 1 , X 2 ) t = f cr}a 2 p t dt. (9.13) 

Jo 

The target variable 9 is random in the general setup but in our examples it will be 
a constant quantity. This quantity can be estimated with the asynchronous covari¬ 
ance estimator also called the Hayashi-Yoshida estimator defined as follows: 

u n = ( x U-XW-^(Xl 2 j-X 2 T2U _ 1] )l lIinj j m . (9.14) 

i,j:T li <T,T 2 J<T 

That is, the product of any pair of increments (X^ u — Y^ 1{i _ 1} ) and (X 2 2j — 
Y 2 2(; !}) will make a contribution to the sum only when the respective 
observation intervals /' = (r 1(,_1} , T 1 '] and J 7 = (T 2 ' r 1 T 2,J J are overlap¬ 
ping with each other as in Figure 9.4. The estimator U n is consistent and possesses 
an asymptotically mixed normal distribution as n —> oo if the maximum length 
between two consecutive observing times tends to 0. This is an important result 
for covariance estimator because, before this result was proved, the usual covari¬ 
ance estimator suffered from the so-called Epps effect (Epps 1979). Epps showed 
that stock return correlations decrease as the sampling frequency of data increases. 
Since his discovery the phenomenon has been detected in several studies of 
different stock markets and foreign exchange markets. With the usual covariance 
estimator in high resolution data the correlations are significantly smaller than 
their asymptotic value as observed on daily data. Hayashi and Yoshida (2005) 
show indeed that even under very mild assumptions, the covariance estimator is 
biased and inconsistent. In fact, under certain assumptions, it converges to zero. 
The Hayashi-Yoshida estimator is not affected by this Epps effect. 










364 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


9.2.1 Numerical example 


The Hayashi-Yoshida estimator is implemented in the yuima package in the cce 
function as well as in the realized package in the function rc. hy. We now apply 
the cce function to asynchronous high-frequency simulated data. As an example, 
consider a two-dimensional stochastic process (Xj , Xf) satisfying the stochastic 
differential equation 


dZ, 1 = CTudS/, 
dX 2 = a 2 ,rd B 2 . 


(9.15) 


Here B / and Bf denote two standard Wiener processes; however we take them 
correlated in the following way: 


B / = W}, (9.16) 

b} = r Ps dwj+f 

Jo Jo 

where Wj and Wf are independent Wiener processes, and p, is the correlation 
function between B) and Bp. We consider a, ,, i = 1,2 and p t of the following 
form in this example: 


/l - p}dW 2 s , 


(9.17) 


O'! j — V 1 + t, 


<*2,t = \/l +t 2 , 



The parameter we want to estimate is the quadratic covariation between X 1 
and X 2 \ 

6 = (X U X 2 ) T = [ °ijcr 2 , t Ptdt = 1. (9.18) 

Jo 


So we first build the model within the yuima package 


R> diffl <- functionft, xl = 0, x2 = 0) sqrt(l + t) 

R> diff2 <- function(t, xl = 0, x2 = 0) sqrtfl + t*2) 

R> rho <- function(t, xl = 0, x2 = 0) sqrt(1/2) 

R> diff.matrix <- matrix(c("diffl(t,xl,x2)", "diff2(t,xl,x2) 

* rho (t, xl, x2) ", 

+ "diff2(t,xl,x2) * sqrt(1-rho(t,xl,x2) *2) "), 2, 2) 

R> cor.mod <- setModel(drift = c diffusion = diff.matrix, 

+ solve.variable = c("xl", "x2")) 


and prepare a function true, theta to calculate numerically the true value of 9 

R> true.theta <- function(T, sigmal, sigma2, rho) { 

+ f <- function(t) { 

+ sigmal (t) * sigma2 (t) * rho (t) 

+ } 

+ integrateff, 0, T) 

+ } 
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so that it is possible to play with the model to test different situations. For the 
sampling scheme, we will consider two independent Poisson sampling. That is, 
each configuration of the sampling times T lk is realized as the Poisson random 
measure with intensity np t , and the two random measures are independent each 
other as well as the stochastic processes. Under this particular random sampling 
scheme, it is known from Hayashi and Yoshida (2005, 2008) that 

« 1/2 (t/„-@)-> N(0,c), (9.19) 

as n —> oo, where 



(9.20) 

and hence it is possible to estimate the asymptotic variance of the covariance 
estimator. So we simulate the model 

R> set.seed(123) 

R> Terminal <- 1 
R> n <- 1000 

R> yuima.samp <- setSampling(Terminal = Terminal, n = n) 

YUIMA: 'delta' (re)defined. 

R> yuima <- setYuima(model = cor.mod, sampling = yuima.samp) 

R> X <- simulate(yuima) 

R> theta <- true.theta(T = Terminal, sigmal = 
diffl, sigma2 = diff2, 

+ rho = rho)$value 
R> theta 

[1] 0.9995767 

We calculate the covariance from the complete synchronous series 

R> cce(X)$covmat[1, 2] 

[1] 1.086078 

and now we apply the random sampling using the two Poisson processes. We 
first construct two grids of random sampling 

R> pi <- 0.2 
R> p2 <- 0.3 

R> newsamp <- setSampling(random = list(rdist = c(function(x) 
rexp (x. 
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+ rate = pi * n/Terminal), function(x) rexpfx, rate = pi * 

+ n/Terminal) ) ) ) 


Now newsamp contains information about Poisson random sampling so we can 
apply the subsampling function to obtain the asynchronous series 


R> Y <- subsampling(X, sampling = newsamp) 


The result is visible in Figure 9.5. We can now estimate the covariance on the 
new path y 

R> cce(Y) $covmat[1, 2] 

[ 1 ] 1.070313 

and we get a reasonable good estimate 9. Using (9.20) we also obtain the asymp¬ 
totic variance of 0 

R> var.c <- functionfT, pi, p2, sigmal, sigma2, rho) { 

+ tmp_integrandl <- function(t) (sigmal(t) * sigma2(t))*2 

+ il <- integrate(tmp_integrandl, 0, T) 

+ tmp_integrand2 <- function(t) (sigmal(t) * sigma2(t) * 

rho(t))*2 

+ i2 <- integrate(tmp_integrand2, 0, T) 

+ 2 * (1/pl + l/p2) * il$value + 2 * (1/pl + l/p2 - 1/(pi + 

+ p2)) * i2$value 

+ } 

and calculate the approximate standard deviation 

R> vc <- var.c(T = Terminal, pi, p2, diffl, diff2, rho) 

R> sqrt(vc/n) 

[ 1 ] 0.2188988 


* 3 - 

d 



Figure 9.5 Simulated path of a two-dimensional asynchronous diffusion model. 
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9.3 LASSO model selection 

The least absolute shrinkage and selection operator (LASSO) is a useful and well 
studied approach to the problem of model selection and its major advantage is 
the simultaneous execution of both parameter estimation and variable selection 
Efron et al. (2004), Knight and Fu (2000), Tibshirani (1996). This is realized 
by the fact that the dimension of the parameter space does not change (while 
it does with the information criteria approach, e.g. in AIC, BIC, etc.), because 
the LASSO method only sets some parameters to zero to eliminate them from 
the model. The LASSO method usually consists in the minimization of an L 2 
norm under L l norm constraints on the parameters. Thus it usually implies least 
squares or maximum likelihood approach plus constraints. 

Originally, the LASSO procedure was introduced for linear regression prob¬ 
lems, but, in recent years, this approach has been applied to time series analysis 
by several authors mainly in the case of autoregressive models. For example, 
just to mention a few, Wang et al. 2007 consider the problem of shrinkage esti¬ 
mation of regressive and autoregressive coefficients, while Nardi and Rinaldo 
(2008) consider penalized order selection in an AR (p) model. The VAR case 
was considered in Hsu et al. (2008). Very recently Caner (2009) studied the 
LASSO method for general GMM estimator also in the case of time series. 

Here we present the LASSO approach for discretely observed diffusion pro¬ 
cesses. For diffusion processes, the LASSO method requires some additional care 
because the rate of convergence of the parameters in the drift and the diffusion 
coefficient are different. We point out that, the usual model selection strategy 
based on AIC Uchida and Yoshida (2005) usually depends on the properties 
of the estimators but also on the method used to approximate the likelihood. 
Indeed, AIC requires the precise calculation of the likelihood Iacus (2008) while 
the LASSO approach presented here depends solely on the properties of the 
estimator and so the problem of likelihood approximation is not particularly 
compelling. 

Let {X t , t > 0} be a ^-dimensional diffusion process solution of the following 
stochastic differential equation 

dZ, = b(a, X,)dt + cr(P, X,)dB, (9.21) 

where a = (oq,..., a p ) e 0 P C M p , p > 1, ft = (fti, ..., ft q ) e Q q C M 9 , q > 1, 
b : 0 P x R d -* R d , a : @ ? x -> R' / x and B, is a standard Brownian 

motion in W 1 . We assume that the functions b and a are known up to the 
parameters a and ft. We denote by 6 — (a, ft) e ® p x ® q — © the parametric 
vector and with 6q = (ao, fto) its unknown true value. Let T,(ft, x) — a(ft, x)® 2 . 
The sample path of X t is observed only at n + 1 equidistant discrete times such 
that tj — tj -1 = A„ < oo for 1 < i < n (with to — 0 and t n+ \ — t). We denote by 
X„ = {X,. }o<;<„ the sample observations. The asymptotic scheme adopted in 
this paper is the following: n A„ —> oo, A„ —> 0 and n A 2 —x 0 as n —> oo. The 
process is supposed to be ergodic and the usual assumption on the regularity of 
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the coefficients of the stochastic differential equations are supposed to hold. For 
details see the original paper of De Gregorio and Iacus (2010a). 


In order to introduce the LASSO problem, we consider the negative quasi- 
loglikelihood function H„ : M (n+1)xd x © -* R 



+ — (AX,- - A„Z7,_ 1 (a))'ET 1 1 ( / 6)(AX i - A,A-i(<*)) } (9.22) 


•n 



likelihood has been used by, e.g., Yoshida (1992), Genon-Catalot and Jacod 
(1993) and Kessler (1997) to estimate stochastic differential equations because the 
true transition probability density for X t , t e [0, T], does not have a closed form 
expression. The function (9.22) is obtained by discretization of the continuous 
time stochastic differential equation (9.21) by Euler-Maruyama scheme, that is 



and the increments ( X ti — X ti l ) are conditionally independent Gaussian random 
variables for / = 1,..., n. 

We denote by H„(X„,0) the vector of the first derivatives with respect to 
6 and by H„(X„,0) the Hessian matrix. Let 6 n : M ( ' i+| ) xd -» 0 be the quasi¬ 
maximum likelihood estimator of 6 e 0, based on (9.22), that is 


0 n = (a n , PnY = argminH„(X n , 0). 
0 


Let 1(6) be the positive definite and invertible Fisher information matrix at 0 
given by 



where 




Moreover, we consider the matrix 
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where I p and \ q are respectively the identity matrix of order p and q. Let A„ (9) = 
(p(n) 1 / 2 ft)(p(n) ] > 2 . Under the usual regularity conditions (De Gregorio and 
Iacus 2010a), we have that the following two properties hold true 

i) A n (0 o ) 4 Wo), sup | A„(0 + Gq) - A„(0 O )| = o p { 1), for e n -> 0 as 

II0||<<* 

n oo; 

ii) 6 n is a consistent estimator of 0q and asymptotically Gaussian, i.e. 

V(n)~ X/2 (0 n - Go) 4 A(0, J(0 O ) _1 ). 

9.3.1 Modified LASSO objective function 

The classical adaptive LASSO objective function, in this case, should be given by 

p q 

(X n ,&) + £ k nJ \aj | + Yn,k\Pk I (9-23) 

7=1 k =1 

where k n j and y n ,k assume real positive values representing an adaptive amount 
of the shrinkage for each elements of a and p. The LASSO estimator is the min- 
imizer of the objective function (9.23). Usually, this is a nonlinear optimization 
problem under L\ constraints which might be numerically challenging to solve. 
Nevertheless, using the approach of Wang and Leng (2007), the minimization 
problem can be transformed into a quadratic minimization problem (under L\ 
constraints) which is asymptotically equivalent to minimizing (9.23). Indeed, by 
means of a Taylor expansion of H n (X„, 6) at G n , one has immediately that 

ElnfX,,, 6) — H„(X„, 6 n ) + (9 - G n )' H„(X„, G n ) 

+\{0 - G n ym n (Xn,G n )(G - G n ) + o p { 1) 

= H„(X„, G n ) + \{G - 0„yH„(X„, G n )(G - 0 n ) + o p { 1) 

Therefore, we define the LASSO-type estimator G n : M (n+1)xrf -> 0 as the solu¬ 
tion of 


G n = (a„, $„)' = argminT 7 ^) 
0 


(9.24) 


where 


p q 

H8) - (0 - On)' H„ (Xn , 0„)(G - On) + Kj I a j I + £ Yn,k \Pk I • (9-25) 

7=1 k=\ 

Then, the least squares problem based on (9.25) is asymptotically equivalent to 
the original LASSO problem deriving from the objective function (9.23), but 
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much easier to solve numerically. The function T{9) is a penalized quadratic 
form and has the advantage of providing a unified theoretical framework. Indeed, 
the objective function (9.23) allows us to perform correctly the LASSO pro¬ 
cedure only if H„ is strictly convex and this fact restricts the choice of the 
possible contrast functions for the model (9.21). Then, the function (9.25) over¬ 
comes this criticality. We also point out that T(0) has two constraints, because 
the drift and diffusion parameters aj and ft are well separated with different 
rates of convergence. It is also possible to show that the LASSO estimators 
solutions to (9.25) satisfy the so-called oracle property, which means: 1) it iden¬ 
tifies the right subset model (i.e. only parameters which are zero are effectively 
set to zero) and 2) it has the optimal estimation rate and converge to a Gaus¬ 
sian random variable N( 0, X) where X is the covariance matrix of the true 
subset model. 

9.3.2 Adaptiveness of the method 

Clearly, the theoretical and practical implications of our method rely on the 
specification of the tuning parameter X n j and y n j, . As observed in Wang and 
Leng (2007), these values could be obtained by means of some model selection 
criteria like generalized cross-validation, Akaike information criteria or Bayes 
information criteria. Unfortunately, this solution is computationally heavy and 
then impracticable. Therefore, we propose to choose the tuning parameters as in 
Zou (2006) in the following way 

Kj = k 0 | a nJ r s \ y n , k = yolftjP* 2 (9.26) 

where a n j and ftj. are the unpenalized estimator of a ; and ft respectively, 
<5 i,<$ 2>0 and usually taken as unitary. The asymptotic results hold under the 
additional conditions 

Xo a i —1 yo s 2 ~ l 

—= -> 0, (n A„) 2 Xq —>■ oo, and —— -> 0, n - yo -> oo 
s/nA. n Jn 

as n —> oo, as proved in De Gregorio and Iacus (2010a). 

9.3.3 LASSO identification of the model 
for term structure of interest rates 

As an application of the LASSO approach we reanalyze the U.S. Interest Rates 
monthly data from 06/1964 to 12/1989 for a total of 307 observations. These data 
have been analyzed by many authors including Nowman (1997), Ait-Sahalia 
(1996), Yu and Phillips (2001) just to mention a few references. We do not 
pretend to give the definitive answer on the subject, but just to analyze the effect 
of the model selection via the LASSO in a real application. The data used for 
this application were taken from the R package Ecdat by Croissant (2010). 
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R> library(Ecdat) 

R> library(sde) 

R> data(Irates) 

R> rates <- Irates[, "rl"] 

R> plot(rates) 

The different authors all try to fit to the data in Figure 9.6 a version of the 
so called CKLS model presented in Section 5.3. We remember that the CKLS 
process is the solution X, of the following stochastic differential equation: 

dX, = (a + pX,)dt + crXf dB,. 

This model encompasses several other models depending on the number of non¬ 
null parameters as Table 5.1 shows. This makes clear why the model selection on 
the CKLS model is quite appealing for this application. In this application, we 
estimate the parameters using quasi-likelihood method (QMLE in the tables) in 
the first stage, then set the penalties as in (9.26) and run the LASSO optimization. 
To this aim, we make use of the function lasso in the yuima package. We 
describe the model and the data for the yuima object 


R> require(yuima) 

R> X <- window(rates, start = 1964.471, end = 1989.333) 
R> mod <- setModel(drift = "alpha+beta*x", diffusion - 
matrix ("sigma *x / 'gamma ", 

+ 1, D) 

R> yuima <- setYuima(data = setData(X), model = mod) 


and then we let lasso estimate the CKLS parameters by using both quasi¬ 
maximum likelihood and LASSO method by first using mild penalties, i.e., 
X 0 = y Q = 1 in (9.26). This is specified in a single list argument in lasso 



Figure 9.6 The U.S. Interest Rates monthly data from 06/1964 to 12/1989. 
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R> lambdal <- list (alpha = 1, beta = 1, sigma = 1, gamma = 1) 

R> start <- list(alpha - 1, beta = -0.1, sigma = 0.1, gamma = 1) 
R> low <- list(alpha - -5, beta = -5, sigma = -5, gamma = -5) 

R> upp <- list(alpha = 8, beta = 8, sigma - 8, gamma - 8) 

R> lassol <- lasso(yuima, lambdal, start = start, lower = low, 
upper = upp, 

+ method = "L-BFGS-B") 

Looking for MLE estimates... 

Performing LASSO estimation... 


and we obtain the quasi-maximum likelihood estimates 

R> round(lassol$mle, 3) 

sigma gamma alpha beta 
0.133 1.443 2.076 - 0.263 

and the LASSO estimates 

R> round(lassol$lasso, 3) 

sigma gamma alpha beta 
0.131 1.449 1.486 - 0.145 

Further, we use strong penalties, i.e., a (i = y () — 10 

R> lambdalO <- list(alpha = 10, beta = 10, sigma - 10, gamma = 10) 
R> lassolO <- lasso(yuima, lambdalO, start = start, lower = low, 

+ upper = upp, method = "L-BFGS-B") 

Looking for MLE estimates... 

Performing LASSO estimation... 

and check the results 

R> round(lassolO$mle, 3) 

sigma gamma alpha beta 
0.133 1.443 2.076 - 0.263 

R> round(lassolO$lasso, 3) 

sigma gamma alpha beta 
0.117 1.503 0.591 0.000 

Very strong penalties suggest that the model does not contain the term p and, 
in both cases, the LASSO estimation suggest y — 3/2, therefore a model quite 
close to Cox et cil. (1980). Being a shrinkage estimator, the LASSO estimates 
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Table 9.1 Model selection on the CKLS model for the U.S. interest rates data. 
Table taken from Yu and Phillips (2001) and updated with LASSO results. 
Standard errors in parenthesis when available. 


Model 

Estimation Method 

Oi 

P 

G 

Y 

Vasicek 

MLE 

4.1889 

-0.6072 

0.8096 

— 

CKLS 

Nowman 

2.4272 

-0.3277 

0.1741 

1.3610 

CKLS 

Exact Gaussian 

2.0069 

-0.3330 

0.1741 

1.3610 



(0.5216) 

(0.0677) 



CKLS 

QMLE 

2.0755 

-0.2630 

0.1325 

1.4433 



(0.992) 

(0.196) 

(0.026) 

(0.103) 

CKLS 

QMLE + LASSO 

1.4863 

-0.1454 

0.1309 

1.4493 


with mild penalization 

(0.701) 

(0.138) 

(0.018) 

(0.073) 

CKLS 

QMLE + LASSO 

0.5914 

0.0002 

0.1168 

1.5035 


with strong penalization 

(0.211) 

(0.005) 

(0.018) 

(0.073) 


have very low standard error compared to the other cases. Our application of 
the LASSO method is reported in Table 9.1 along with the results from Yu and 
Phillips (2001) just for comparison. 

We now perform a small example to prove the effectiveness of the LASSO 
method also in the multdimensional case. We consider a two-dimensional stochas¬ 
tic differential equation of the form: 

d*M _ / -d2.iXj - 9 2 .2 \ , f 01.1 1 
dXj ) \ - 62 . 2 X? - 02.1 ) + [ 01.2 1 

*0 = 1 , *0 = 1 

which we prepare into a yuima model 

R> diff.matrix <- matrix(c("thetal.1 ", "thetal.2", "1", "1"), 2, 

+ 2 ) 

R> drift.c <- c("-theta2.l*xl", theta2.2*x2", "~theta2.2", 

" -theta2.1") 

R> drift.matrix <- matrix(drift.c, 2, 2) 

R> ymodel <- setModel(drift = drift.matrix, diffusion = 
diff.matrix, 

+ time.variable = ”t", state.variable = c("xl", "x2"), 

solve.variable = c("xl", 

+ "x2")> 

R> n <- 100 

R> ysamp <- setSampling (Terminal = (n)''(l/3), n = n) 

YUIMA: 'delta' (re)defined. 



R> yuima <- setYuima(model = ymodel, sampling = ysamp) 

We set the true values for 6 1.2 and 61.2 to zero and simulate the trajectory 
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R> set.seed<123) 

R> truep <- list (thetal.1 = 0.6, thetal.2 = 0, theta2.1 = 0.5, 
theta2.2 = 0) 

R> yuima <- simulate(yuima, xinit = c(l, 1), true.parameter = 
truep) 

we finally apply the LASSO method 

R> est <- lasso(yuima, start = list(theta2.1 = 0.8, 
theta2.2 = 0.2, 

+ thetal.1 = 0.7, thetal.2 = 0.1), lower = 

list(thetal.1 = le-10, 

+ thetal.2 = le-10, theta2.1 = 0.1, theta2.2 = le-10), 

upper = list(thetal.1 = 4, 

+ thetal.2 - 4, theta2.1 = 4, theta2.2 = 4), method - 

"L-BFGS-B") 


Looking for MLE estimates... 
Performing LASSO estimation... 

R> unlist(truep) 

thetal.1 thetal.2 theta2.1 theta2.2 
0.6 0.0 0.5 0.0 


R> round(est$mle, 3) 

thetal.1 thetal.2 theta2.1 theta2.2 
0.559 0.000 0.741 0.024 

R> round(est$lasso, 3) 

thetal.1 thetal.2 theta2.1 theta2.2 
0.558 0.000 0.670 0.000 


and see that the LASSO method shrinks towards zero the two estimates of 6 * 1.2 
and 6 * 2.2 as expected. 


9.4 Clustering of financial time series 

In recent years, there has been a lot of interest in mining financial time series 
data. Although many measures of dissimilarity are available in the literature (see 
e.g. Liao (2005), for a review) most of them ignore the underlying structure 
of the stochastic model which drives the data. Among the few measures which 
take into account the properties of the data generating model we can mention 
Hirukawa (2006) which considers non-Gaussian locally stationary sequences; Pic¬ 
colo (1990) proposed an AR metrics and Otranto (2008) adapted it to GARCH 
models. Caiado et al. (2006) used an approach based on periodograms; Xiong and 
Yeung (2002) proposed a model based clustering for mixtures of ARMA models. 
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Kakizawa et al. (1998) and Alonso et al. (2006) performed clustering based on 
several information measures constructed on the estimated densities of the pro¬ 
cesses. In this section we consider a distance tailored to measure discrepancy of 
discretely observed diffusion processes and apply this distance in a simple cluster 
analysis. Cluster analysis is an explorative data analysis tool that, starting from a 
matrix of dissimilarities between couple of observations in a sample, groups the 
observations into clusters (subgroups) according to some rule. Roughly speaking, 
rules can be of agglomerative type (individual observations are put together) or 
of a divisive type (the observations which are more dissimilar are put away from 
the initial group of all units). Once the first groups are formed, other rules are 
necessary to decide, e.g. in the agglomerative case, how to aggregate more units 
to the formed groups or groups together, etc. An interesting review on cluster 
methods can be found in Kaufman and Rousseeuw (1990) and the corresponding 
implementation in R is available through the cluster package. 

9.4.1 The Markov operator distance 

This new dissimilarity De Gregorio and Iacus (2010b) is based on a new applica¬ 
tion of the results by Hansen et al. (1998) on identification of diffusion processes 
observed at discrete time when the time mesh A between observations is not nec¬ 
essarily shrinking to zero. The theory proposed in Hansen et al. (1998) has been 
used in Kessler and Spre risen (1999) and Gobet et al. (2004) in parametric and 
nonparametric estimation of diffusion processes respectively. The theory is based 
on the fact that, when the process is not observed at high frequency, i.e. A -/> 0, 
the observed data form a true Markov process for which it is possible to identify 
the Markov operator P&. Consider now the regularly sampled data X, = X{i A), 
i = 0,. .., N, from the sample path of { X ,, 0 < t < T\, where A > 0 and is not 
shrinking to 0 and such that T — N A. The process X = { X ,}/ =o ,v is a Markov 
process and under mild regularity conditions, all the mathematical properties of 
the model are embodied in the transition operator defined as follows: 


P A f(x)=nf(X i )\X i - 1 =x}. 

Notice that P A depends on the transition density from X,_i to X t , so we put 
explicitly the dependence on A in the notation. This operator is associated with 
the infinitesimal generator of the diffusion, namely the following operator on the 
space of continuous and twice differentiable functions /(■) 

Lb, a fix) = f" ix) + b(x)f(x). 

We assume that, under condition (9.2), the invariant density /x = /x/ ;rT (■) of the 
process X t exists and its explicit form is: 

, , mjx) _ ™v[ 2 fx X M) dy } 

>Xh ' A) Co C 0 a 2 (x) 
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Then, the operator is unbounded but self-adjoint negative on L 2 (/a) = {/ : 
/ |/| 2 d/u < 00 } and the functional calculus gives the correspondence (in terms 
of operator notation) 

P A = expjAL^J. 

This relation has been first noticed by Hansen et al. (1998) and Chen et al. 
(1997). For a given L 2 -orthonormal basis ( (pj , /' e ./ [ of L 2 ([Z, r]), where 
J is an index set, following Gobet et al. (2004) it is possible to obtain the 
matrix P A (X) = [(AOmWIms/, which is an estimator of < P A <pj, > kLba , 
where 


1 

(^a)m(X) = ^T7 Y. {Wi-t)0*(Xi) + MXi-iWjiXi)}, j,k€ J. 

^ i =1 

(9.27) 

The terms (P a )j,a are approximations of < P A . <p k > llb n , that is, the action 
of the transition operator on the state space with respect of the unknown scalar 
product < -, ■ >fi ha and hence can be used as ‘proxy’ of the probability structure 
of the model. Therefore, we introduce the following dissimilarity measure. 

Definition 9.4.1 Let X and Y be discrete time observations from two diffusion 
processes. The Markov Operator distance is defined as 

d MO (X, Y) = ||P A (X) - P A (Y)11j - K^a)m(Y) - (P A )j,k(Y)\, (9.28) 

jMJ 

where (P A );,a(-) is calculated as in (9.27) separately for X and Y. 

Notice that <7,wo (X, Y) is the element-wise L 1 distance for matrixes, not 
simply a dissimilarity measure (i.e. it also respects the triangular inequality). 

Like the invariant density /i/ ; rr , the Markov operator itself cannot perfectly 
identify the underlying process, in the sense that, for some (b\, oq) there might 
exist another couple (Z? 2 , rr 2 ) such that /x/^ (T| (x) = /x/, 2 ,^ 2 (x). The same consid¬ 
erations apply to the infinitesimal generator and hence to the Markov operator. 
Nevertheless, the distance 4o helps in finding similarities between two (or 
more) processes in terms of the action of their Markov operators. 

9.4.2 Application to real data 

In this section we compare the Markov operator distance with few other dis¬ 
tances. We denote by X = {X,, i — 1,..., N} and Y — [Yj,i = 1,..., N] two 
discretely observed data from continuous time diffusion processes. We compare 
the following distances. 
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9.4.2.1 The Markov-operator distance 

The Markov operator distance d/wo is calculated using formula (9.28). As in ReiB 
(2003) we deal with a basis 50 orthonormal B-splines on a compact support of 
degree 10 (see Ramsay and Silverman (2005)). As compact support we consider 
the observed support of all simulated diffusion paths enlarged by 10 %. This 
function is implemented in the function MOdist in the package sde. 

9.4.2.2 Short-time-series distance 

Proposed by Moller-Levet et al. (1978) is based on the idea to consider each 
time series as a piecewise linear function and compare the slopes between all the 
interpolants. It reads as 



This measure is essentially designed to discover similarities in the volatility 
between two time series regardless of the average level of the process, i.e. one 
process and a shifted version of it will have zero distance. 

9.4.2.3 The Euclidean distance 

The usual Euclidean distance 


N 


d EUC (X,Y)= ^(A, - T,) 2 
N '■=! 


is one of the most used in the applied literature. We use it only for comparison 
purposes. 

9.4.2.4 Dynamic time warping distance 

The Euclidean distance is very sensitive to distortion in time axis and may lead 
to poor results for sequences which are similar, but locally out of phase (Corduas 
2007). The Dynamic Time Warping (DTW) distance was introduced originally in 
speech recognition analysis (Sakoe and Chiba (1978); Wang and Gasser (1997)). 
DTW allows for nonlinear alignments between time series not necessarily of the 
same length. Essentially, all shiftings between two time series are attempted and 
each time a cost function is applied (e.g. a weighted Euclidean distance between 
the shifted series). The minimum of the cost function over all possible shiftings 
is the dynamic time warping distance durw- In our applications we use the 
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Euclidean distance in the cost function and the algorithm as implemented in the 
R package dtw (Giorgino 2009). 

9.4.2.5 The data 

We consider time series of daily closing quotes, from 01 March 2006 to 31 
December 2007, for the following 20 financial assets: Microsoft Corporation 
(MSOFT in the plots), Advanced Micro Devices Inc. (AMD), Dell Inc. (DELL), 
Intel Corporation (INTEL), Hewlett-Packard Co. (HP), Sony Corp. (SONY), 
Motorola Inc. (MOTO), Nokia Corp. (NOKIA), Electronic Arts Inc. (EA), LG 
Display Co., Ltd. (LG), Borland Software Corp. (BORL), Koninklijke Philips 
Electronics NV (PHILIPS), Symantec Corporation (SYMATEC), IPMorgan 
Chase & Co (JMP), Merrill Lynch & Co., Inc. (MLINCH), Deutsche Bank AG 
(DB), Citigroup Inc. (CITI), Bank of America Corporation (BAC), Goldman 
Sachs Group Inc. (GSACHS) and Exxon Mobil Corp. (EXXON). Quotes 
come from NYSE/NASDAQ. Source Yahoo.com. Missing values (the same 
19 festivity days over 520 daily data) have been linearly interpolated. These 
assets come from both electronic hardware, appliance and software vendors or 
producers, financial institutions of different type and a petrol company. These 
data are preloaded in the data set quotes of the sde package and plotted in 
Figure 9.7 


R> require(sde) 

R> data(quotes) 

R> Series <- quotes 
R> nSeries <- dim(Series)[2] 

R> plot(Series, main = xlab = "") 

As anticipated, we now cluster these financial data using the four distances d^o, 
dsuc > dsrs and d DTW . For the Short-Time-Series distance dsrs we write our 
own code first 

R> STSdist <- function(data) { 

+ nSer <- NCOL(data) 

+ d <- matrix(0, nSer, nSer) 

+ colnames(d) <- colnames(data) 

+ rownames(d) <- colnames(data) 

+ DELTA <- deltat(data) 

+ for (i in l:(nSer - 1)) for (j in (i + l):nSer) { 

+ d[i, j] <- sqrt(sum((diff(data[, i])/DELTA - diff(data[, 

+ j])/DELTA)*2)) 

+ d[j, i] <- d[i, j] 

+ } 

+ invisible(d) 

+ } 

We now use MOdist function from the sde, our function Eucdist, the standard 
R function dist for the Euclidean distance and finally make use of the dtw 
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Fieure 9.7 Paths of the 20 assets considered: from 01 March 2006 to 31 Decem¬ 
ber 2007. 


package. With the only aim of making the distances comparable, we normalize 
each by its maximum value, therefore all distances will be in the interval [0, 1] 


R> dMO <- MOdist(Series) 

R> dMO <- dMO/max(dMO) 

R> dEUC <- dist(t(Series)) 

R> dEUC <- dEUC/max(dEUC) 

R> dSTS <- STSdist(Series) 

R> dSTS <- dSTS/max(dSTS) 

R> require(dtw) 

Loaded dtw vl.14-3. See ?dtw for help, citation("dtw") for usage 
conditions. 






















380 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


R> dDTW <- dist(t(Series), method - "dtw") 

R> dDTW <- dDTW/max (dDTW) 

We now apply hierarchical clustering with complete linkage method and represent 
the dendrograms in Figure 9.8 with the following code 

R> cl <- hclust(dMO) 

R> plot (cl, main - "Markov Operator Distance", xlab = 
ylim = c(0, 

+ D) 

R> rect.hclust(cl, k = 6, border = gray(0.5)) 

R> ell <- hclust(as.dist(dEUC)) 

R> plot(ell, main = "Euclidean Distance", xlab = " ylim = c(0, 

+ D) 

R> rect.hclust(ell, k - 6, border = gray(0.5)) 

R> c!2 <- hclust (as.dist(dSTS)) 

R> plot (cl2, main = "STS Distance", xlab = ylim = c(0, 1)) 

R> rect.hclust(cl2, k - 6, border = gray(0.5)) 

R> c!3 <- hclust(as.dist(dDTW)) 

R> plot(cl3, main = "DTW Distance", xlab = ylim = c(0, 1)) 

R> rect.hclust(cl3, k = 6, border = gray(0.5)) 

Figure 9.8 requires some explanation. The dendrogram in each plot represents 
the hierarchical structure of the clusters. If two observations are separated by a 
long vertical line it means that their relative distance is high, otherwise they are 
close. So, for example, looking at top-right panel, according to the d M ,9 distance, 
HP and PHILIPS are close as well as MSOFT and DELL, but as a group, {HP, 
PHILIPS) is distant from the other group {MSOFT, DELL). In turn, the group 
formed by {HP, PHILIPS, MSOFT, DELL) is homogenous and distant from 
the other homogeneous groups {AMF, SYMATEC, MOKIA, INTAL, MOTO, 
LG). In particular, LG is separated by a long distance from, e.g. PHILIPS. And 
so forth. 

In Figure 9.8 the dendrogram for the cImo distance, identifies 5 or 6 groups 
and in particular isolates BORL and ‘DB + GSACHS’ into separates clusters 
very clearly (the difference between 5 and 6 groups is that in the 6 groups 
clustering ‘MLINCH + EXXON’ are put in a separate cluster). To isolate the 
BORL asset via the dendrograms of dEuc and dprw we need to cut at least 
into 6 groups. The counter effect of this cutting is that DB an GSACHS go 
into different clusters for these metrics. The metric dsrs does not appear to give 
sharp indication on how to separate clusters. We have then decided to cut all the 
dendrograms into 6 groups, the result is represented by the gray boxes. 

Different methods produce different groupings but with some overlap. In 
order to be more specific in the similarities of the results produced by differ¬ 
ent methods, we make use of a similarity measure proposed by Gravilov et al. 
(2000). Given two clustering C = C 1 ,, Ck (the clusters formed by adopt¬ 
ing one distance) and C' = C[,, C' K , (the clustering obtained using another 
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Markov Operator Distance Euclidean Distance 




STS Distance DTW Distance 



Figure 9.8 Clustering according to different distances. Distances normalized to 
1 just for graphical representation. Although the markers of the terminal nodes 
go below the zero line (see e.g. right-bottom plot), the final nodes are obtained 
cutting the dendrogram above the zero line, which is represented as a dotted line 
just to help visualization. 
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distance), we compute the following similarities 

I Ci n c' | 

sim(Q, C':) = 2— - J —, i = 1,..., K, j = 1. K', 

1 \Ci \ + \Cj\ 

and the final cluster similarity index is given by the formula 

1 K 

Sim(C, C') = —V max sim(C,-,C'). (9.29) 

K z -' / i. K' 1 

i=1 

This index is not symmetric, so, we also apply the symmetrized version of the 
index, namely (Sim(C, C) + SimfC', C))/2, because the real number of clusters 
is not known in advance. In formula (9.29) K and K' may be different, although 
it is not in our case. The similarity index will return 0 if the two clusterings are 
completely dissimilar and 1 if they are the same. 


R> Sim <- function(gl, g2) { 

+ G1 <- unique(gl) 

+ G2 <- unique(g2) 

+ 11 <- length(Gl) 

+ 12 <- length(G2) 

+ sim <- matrix(, 11, 12) 

+ for (i in 1:11) { 

+ idx <- which(gl == i ) 

+ for (j in 1:12) { 

+ idx2 <- which(g2 == j) 

+ simfi, j] <- 2 * length(intersect(idx, 

(length(idx) + 


+ length(idx2)) 

+ } 

+ } 

+ sum (apply (sim, 2, max)) /II 

+ } 


R> G <- cutree (cl, k - 6) 
R> Gl <- cutreefcll, k - 
R> G2 <- cutree(cl2, k = 
R> G3 <- cutree(cl3, k - 
R> A <- matrix(, 4, 4) 


R> A [ 1, 1] <- 
R> A [1, 2] <- 
R> A [1, 3] <- 
R> A [1, 4] <- 
R> A[2, 1] <- 
R> A[2, 2] <- 
R> A[2, 3] <- 
R> A[2, 4] <- 
R> A[3, 1] <- 
R> A [3, 2] <- 
R> A[3, 3] <- 
R> A [3, 4] <- 
R> A[4, 1] 


Sim 

(G, 

G) 

Sim 

(G, 

Gl) 

Sim 

(G, 

G2) 

Sim 

(G, 

G3) 

Sim 

(Gl, 

■ G) 

Sim 

(Gl, 

■ Gl) 

Sim 

(Gl, 

■ G2) 

Sim 

(Gl, 

. G3) 

Sim 

(G2, 

G) 

Sim 

(G2, 

. Gl) 

Sim 

(G2, 

■ G2) 

Sim 

(G2, 

. G3) 

Sim 

(G3, 

G) 


6 ) 

6 ) 

6 ) 


idx2))/ 


<- 
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R> A [4, 2] <- Sim (G3, Gl) 

R> A[4, 3] <- Sim (G3, G2) 

R> A[4, 4] <- Sim(G3, G3) 

R> S <- (A + t (A) ) /2 

The results are presented in Table 9.2. The similarity matrix in Table 9.2 
shows that dEuc and dorw form the same groups, i.e. they are essentially 
the same metric for this data set. The clustering made using d M o is only 
partially in agreement with dEuc and d DTW (0.84). The difference is mainly 
in the placement of the subgroups ‘HP + PHILIPS’ and ‘MSOFT + DELL’. 
Further, the dMO distance considers ‘DB + GSACHS’ together, which makes 
sense for this distance probably because these two time series have the highest 
volatilities. 

EA goes together with SONY in all dendrograms, which is not an unre¬ 
alistic evidence in that the company essentially produces software for game 
consoles. Also for CITI, BAC and JPM the methods agree on their placement. 
To stress more on the comparisons in terms of levels, drift and volatilities of the 
20 time series considered, in Figure 9.9 we also plot the data using the same 
vertical scale. 

In summary, all but the dsTS distance provide similar evidence. Nevertheless, 
dMO easily separates BORL (an outlier if we think at the levels and the volatility 
of times series, see Figure 9.9) and ‘GSACHS + DB’, while with the other 
two competitors, in order to separate BORL, we need to force an additional 
splitting which separates GSACHS and DB. This looks quite unfortunate from a 
substantial point of view. Of course, this is merely an exercise and the analysis 
cannot go deeper than this from a simple cluster analysis. In fact, other financial 
and economics considerations have to be done in analyzing the composition of 
the clusters obtained by any method. 

9.4.3 Sensitivity to misspecification 

As mentioned, one rarely knows if the number of clusters to select is the pre¬ 
cise given number if the data we observed really follow the assumptions of 
Section 9.4.1. To test the robustness of the Markov operator distance against 
model misspecification we report an experiment taken from the original paper of 


Table 9.2 Similarity matrix between the clusters formed by different metrics. 
Similarity calculated according to the similarity index defined in (9.29) (left 
table) and its symmetrized version (right table). 



dMO 

dEUC 

dsrs 

dorw 

dMO 

dEUC 

dsrs 

d D rw 

dMO 

1 

0.84 

0.6 

0.84 

1 

0.81 

0.54 

0.81 

dEUC 

0.79 

1 

0.71 

1 


1 

0.69 

1 

dsrs 

0.48 

0.67 

1 

0.67 



1 

0.69 

dorw 

0.79 

1 

0.71 

1 




1 
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Figure 9.9 Paths of the 20 assets considered in Figure 9.7 represented on the 
same scale to put in evidence differences in the levels and volatilities. 


De Gregorio and Iacus (2010b). This experiment simulates 23 paths according to 
the six different models Mj, j = 1 ,..., 6, obtained via the combinations of drift 
bk and diffusion coefficients a k , k — 1,..., 4 presented in the following table: 



cti(x) 

(7 2 GO 

cr 3 (x) 

04 M 

b\(x) 

Ml 


M4 


b 2 (x) 


M2 

M3 


bffx) 
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M6 
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where 


b\(x) = 1 — 2x, b 2 (x) — 1.5(0.9 — x), b 2 (x) — 1.5(0.5 — x), 

b^{x) = 5(0.05 — x) 


and 


cri(x) = 0.5 + 2x(l—x), cf 2 (x ) = s/ 0.55x(l — x), 

03 (x) = ^/ 0 . 1 x(l — x), 04 (x) = ^/ 0 . 8 x(l — x). 

For each model Mj a different number of rij of trajectories has been simulated 
in order to have an unbalanced simulation design, i.e. «i =5, n 2 — 3, n 2 = 4, 
«4 — 3, — 4, ri( — 1. Further, one trajectory generated with model M\, say X 1 , 

is reversed around the line y = 1, i.e. if 1 — X 1 = X 1 , hence X 1 has drift —b\{x) 
and the same quadratic variation of X 1 . So it still belongs to the class M\ with 
respect to volatility. Then, an additional trajectory was simulated using model 
Mi but with different initial value. By the ergodic property of the simulated path, 
its invariant law still belongs to model M\. Therefore, finally n\ —1. 

Each path was simulated using (second) Milstein scheme (see e.g. Kloden 
and Platen (1999) or Iacus (2008)) with time lag 8 = le — 3. Observations have 
been then resampled at rate A = 0.01 and observed paths of length N — 500 and 
N — 1000 have been used in the analysis in order to capture sample size effects. 
Due to the fact that the number of clusters is known in advance, i.e. K — 6 , we 
can use the cluster similarity index (9.29) for the two clustering C = Cj,..., Ck 
(the real clusters formed by the six models) and C r — C[,C' K , (the clustering 
obtained using one of the above distances). 

Four different experiments were performed. In all cases hierarchical clustering 
with complete linkage method was used. 

9.4.3.1 Experiment 1: Nonperturbed, correctly specified 

Simulate according the above scheme 25 trajectories; calculate the distance 
matrixes d M o, d$rs , duuc and dorw and run clustering. Cut the dendrograms 
into K — K' — 6 groups. Calculate the Sim index for each clustering solution. 

9.4.3.2 Experiment 2: Nonperturbed, misspecified 

Simulate according the above scheme 25 trajectories; calculate the four distances 
and run cluster analysis. Cut the dendrograms into K' — 5 groups, real number 
of groups K — 6 . Calculate the Sim index for each clustering solution. 

9.4.3.3 Experiment 3: Perturbed, correctly specified 

Simulate according the above scheme 25 trajectories. Perturbate the experiment 
adding 2 trajectories from an ARIMA( 1,0,1) process with mean 0.5 and AR 





386 


OPTION PRICING AND ESTIMATION OF FINANCIAL MODELS WITH R 


coefficient 0.9, MA coefficient = —0.22, with Gaussian innovations N(0, 0.01) 
(the parameters of the model are chosen in a way that the simulated trajectories 
looks qualitatively similar to the ones in the Experiment 1.) Calculate the four 
distances, use the same clustering approach as in Experiment 1, set K = 7 and 
cut the dendrograms into K' — 1 groups. 

9.4.3.4 Experiment 4: Perturbed, misspecified 

Proceed as in Experiment 3, set K = 7 and cut the dendrograms K' = 6 groups. 

Each experiment is replicated only 100 times and the average value of the 
cluster similarity index Sim is reported in Table 9.3 for different sample sizes 
N — 500 (up) and N — 1000 (down). The number of replications is limited due 
to excessively long computational time of the DTW distance in dimension 23. 
To test the stability of the Monte Carlo results of the first few 100 replications, 
we drop dorw from the Monte Carlo analysis and replicate each of the four 
experiments 5000 times. Table 9.3 also reports in parenthesis the average values 
but calculated over the 5000 replications. 

Experiment 3 corresponds to a perturbation of the diffusion setup with an 
ARIMA process, while Experiment 4 corresponds to a misspecified setting: there 

Table 9.3 Results of the simulation experiments. Average values of the Sim 
index over 100 replications and, with the exclusion of the dnr w distance, 5000 
replications (in parentheses). A = 0.01, sample size N — 500 (up) and 
N — 1000 (bottom). 


Experiment: N — 500 

dMO 

dEUC 

dsTS 

durw 

Nonperturbed, correctly specified 

0.84 

0.49 

0.27 

0.69 


( 0 . 83 ) 

( 0 . 49 ) 

( 0 . 27 ) 

(-) 

Nonperturbed, misspecified 

070 

0.44 

0.24 

0.60 


( 0 . 69 ) 

( 0 . 43 ) 

( 0 . 24 ) 

(-) 

Perturbed, correctly specified 

0.81 

0.45 

0.39 

0.65 


( 0 . 81 ) 

( 0 . 45 ) 

( 0 . 39 ) 

(-) 

Perturbed, misspecified 

0.71 

0.41 

0.37 

0.58 


( 0 . 70 ) 

( 0 . 41 ) 

( 0 . 37 ) 

(-) 

Experiment: N = 1000 

d MO 

dEUC 

dsTS 

dorw 

Nonperturbed, correctly specified 

0.94 

0.51 

0.27 

0.69 


( 0 . 93 ) 

( 0 . 50 ) 

( 0 . 26 ) 

(-) 

Nonperturbed, misspecified 

0.75 

0.45 

0.24 

0.63 


( 0 . 75 ) 

( 0 . 45 ) 

( 0 . 24 ) 

(-) 

Perturbed, correctly specified 

0.91 

0.47 

0.39 

0.67 


( 0 . 90 ) 

( 0 . 46 ) 

( 0 . 39 ) 

(-) 

Perturbed, misspecified 

0.78 

0.43 

0.37 

0.59 


( 0 . 78 ) 

( 0 . 42 ) 

( 0 . 37 ) 

(-) 
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are K — 7 real clusters, but we induce misclassification selecting only K' — 6 
groups. In Experiment 2 there is only misspecification where the number of real 
groups K is higher than the number of the groups generated with the cluster K'. 

As emerges from the analysis of Table 9.3 we see that all methods perform 
better in Experiment 1, although a clear ordering-for all experiments-emerges 
in the different metrics to discover the correct groups: cImo < c hrr\v < d euc -< 
dsTS > where cl\ < ch means: ‘distance d\ classifies better than df ■ In the case 
of perturbation (Experiment 3) one should expect the Markov operator distance 
should fail to detect the ARIMA group, and instead should not expect any change 
in performance of the other metrics because they do not assume a particular 
stochastic structure of the model. But Table 9.3 shows that all methods are equally 
affected and d MO looks quite robust. Although there is a decrease of performance 
of dMO in the misspecified case (Experiment 4), the dMO distance still performs 
much better that the other competitors. All methods increase performance as 
the number of observations N increases, but the enhancement of the d M o is 
particularly remarkable. This is due to the property of the estimator of the Markov 
operator, which gets better and better as the sample size increases. 


9.5 Bibliographical notes 

The study of volatility, realized variance, and power variations has a long history 
in finance. Early studies are probably Andersen and Bollerslev (1998), Andersen 
et cd. (2003) and Bamdorff-Nielsen and Shephard (2004) from which it is pos¬ 
sible to get all the relevant references. More recently the problem of estimation 
under nonsynchronicity was considered, for example, in Hayashi and Yoshida 
(2005, 2008). In parallel with the study of nonsynchronicity, the high frequency 
data from finance also revealed the microstructure noise effect and its impact 
on estimates of all the above quantities. Some early references on the subject 
are Ait-Sahalia et al. (2005), Bandi and Russell (2006) and Hansen and Lunde 
(2006). The problem of model selection was considered in Uchida and Yoshida 
(2005) and in the recent work on the Lasso method presented in this chapter. 
Both works are good sources of references in this direction. Finally, clustering 
and other methods for continuous time financial model have been considered in 
De Gregorio and Iacus (2010b) and references therein. 
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Appendix A 

‘How to’ guide to R 


This appendix is a compact guide to the R language which focuses on special 
aspects of the language that are relevant to this book. A review of R packages 
useful in finance will be discussed in Appendix B. Along with these pages, the 
reader is also invited to read the quick guide called ‘An Introduction to R ’ that 
comes with every installed version of R or the introductory book on the R 
environment by Dalgaard (2008). 

A.l Something to know first about R 

All commands in R must be typed via the command-line interface, and graphical 
user interfaces (GUI) are very limited, with a few exceptions, such as the package 
Rcmdr by lohn Fox. 

All the commands are given as inputs to R after the prompt > (r> in this 
book) and are analyzed by the R parser after the user presses the ‘retum’/'enter’ 
key (or a newline character is encountered in the case of a script file). 

R> cat("help me!") 

help me! 


R inputs can be multiline; hence, if the R parser thinks that the user did not com¬ 
plete some command (because of unbalanced parentheses or quotation marks), 
on the next line a + symbol will appear instead of a prompt. 

R> cat("help me!" 


This can be quite frustrating for novice users, so it is better to know how to 
exit from this impasse. Depending on the implementation of R usually pressing 
ctrl+c or esc on the keyboard helps. Otherwise, for GUI versions of R, pushing 
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the ‘stop’ button of the R console will exit the parser. Of course, another solution 
is to complete the command (with a ‘) ’ in our example). 

A.1.1 The workspace 

Almost every command in R creates an object and not just text output, and 
objects live in the workspace. The workspace, or groups of objects, can be saved 
and loaded into R with the save. image (or save) and load commands, respec¬ 
tively. The user is prompted about saving workspace when exiting from R. 
This workspace is saved in the current directory as a hidden (on some oper¬ 
ating systems) file named .RData and reloaded automatically the next time R 
is started. 

A.1.2 Graphics 

Usually R graphics are displayed on a device that corresponds to a window for 
a GUI version of R (for example, under MS-Windows, XI1, or Mac OS X). 
Otherwise a Postscript file Rpiots.ps is generated in the current working direc¬ 
tory. Sometimes, in interactive uses of R, it is useful to use par (ask = true ) to 
pause R at each new plot or par (ask = false) to avoid such pauses. We do not 
discuss the multiple R graphic systems here but the reader can refer to Murrel 
(2005) and Deepayan (2008). 

A.1.3 Getting help 

The casual user will find it very hard to get started without prior knowledge of 
which command is needed to perform a particular task. The help system is not 
that useful either. But the R system is thought in a way that every command 
has its help page and documentation always matches the actual implementation 
of the command. To get information about a particular command one should use 
the help, like help (load) or ?heip. For some special operators, the user should 
specify the argument like this: ?" for", ?" + ", etc. When the documentation con¬ 
tains the section ‘Examples’ with R code inside, the code from that page can be 
executed automatically with example (topic) , where topic is the corresponding 
R command of interest, try e.g. example (plot). 

In case one wants to execute a fuzzy search on the help system, one can use 
the command help, search ("topic") and R will return several options which 
partly match the word topic, try e.g. help.search( "regression" ). It is also 
possible to extend the search for a term or for more complicated queries to 
the Web using RSiteSearch, try e.g. RSiteSearch( "nonlinear regression" ) . 
The search will be extended to all documentation pages for packages in the R 
repository and to all the pertinent mailing lists. 1 The web site for R mailing lists 
and related projects is http://stat.ethz.ch/mailman/listinfo. 

1 If you have a question, have a look at the mailing list archives first. 
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There is a rich repository of quick guides or electronic books on R and 
its use in different disciplines which can be found under the section ‘Doc¬ 
umentation/Contributed’ on cran. The direct link to the page is http://cran. 
r-project.org/other-docs.html. Finally, we mention the ‘Task Views’, which are 
collections of R packages organized by macro areas, e.g. ‘Finance’. These are 
again hosted on cran and the direct link is http://cran.r-project.org/web/views. 

A.1.4 Installing packages 

This book, like many others, requires several add-on packages which are not 
distributed with the basic R system. The main repository for R packages is 
called cran ‘The Comprehensive R Archive Network’ and its main address 
is http://cran.r-project.org/. To install a package in the R system, one should 
use the command install .packages with a package name as argument, i.e. 
install .packages ("sde" ) to install the package sde. R GUIs usually offer 
some option to get the list of all packages at the repository (around 2000) and 
install those selected by point and click actions. 

Another important source of R packages is the R-Forge repository. It is 
a repository mainly for developers but where users can also find pre-release of 
developer versions of packages already on cran or even packages not necessarily 
hosted on cran. The home page of R-Forge is http://r-forge.r-project.org. To 
install a package from R-Forge one can use a command like 

R> install.packages("sde",repos="http://R-Forge.R-project.org") 

i.e. adding to install .packages the option repos with a proper web address. 


A.2 Objects 

As mentioned, most functions in R return objects rather than text output. Clearly 
objects can be created anew with commands. We now describe how to create, 
inspect and manipulate objects. 

A.2.1 Assignments 

To create an object it is necessary to use the operator *<-’ which has the 
meaning ‘assign the right-hand side to the left-hand side’, or use the more 
common operator ‘ = ’, as in the following lines in which we create an object 
named x and assign the number 4 to it: 

R> x <- 4 
R> x = 4 

Similarly, one can use the operator ‘->’ which assigns the left-hand side to the 
right-hand side, i.e. x -> 4. The following command creates a more interesting 


396 


APPENDIX A 


vector y containing the numbers 2, 7, 4, and 1 concatenated in a single object 
using the function c (): 

R> y <- c(2, 7, 4, 1) 

R> y 

[1] 2 7 4 1 

A matrix can be created using the matrix command 

R> z <- matrix(l:30, 5, 6) 

R> z 



[,1] 

[,2] 

[ < 3 ] 

[,4] 

[,5] 

[,6] 

[1, ] 

1 

6 

11 

16 

21 

26 

[2, ] 

2 

7 

12 

17 

22 

27 

[3, ] 

3 

8 

13 

18 

23 

28 

[4, ] 

4 

9 

14 

19 

24 

29 

[5, ] 

5 

10 

15 

20 

25 

30 


Where l: 3 0 produces a sequence from 1 to 30 by unitary step, i.e. 


R> 1:30 

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

21 22 23 24 25 
[26] 26 27 28 29 30 

The command matrix requires at least three arguments, where the second and 
third are the number of rows and columns and the first one is an object which is 
used recursively to fill the elements of the matrix. Of course, we can create an 
empty matrix with 

R> matrix (, 5, 6) 



[,1] 1 

[,2] | 

[ / 3 ] | 

[,4] | 

[ / 5 ] | 

[,6] 

[1, ] 

NA 

NA 

NA 

NA 

NA 

NA 

[2, ] 

NA 

NA 

NA 

NA 

NA 

NA 

[3, ] 

NA 

NA 

NA 

NA 

NA 

NA 

[4, ] 

NA 

NA 

NA 

NA 

NA 

NA 

[5, ] 

NA 

NA 

NA 

NA 

NA 

NA 


where na is the R symbol for the missing values, or empty numerical vectors 
with numeric 


R> numeric (4) 

[ 1 ] 0 0 0 0 

The command is () shows the current content of the workspace 


•HOW TO’ GUIDE TO R 


397 


R> Is () 

[1] "x" "y" "z" 

Notice that all objects which are created but not assigned, are not kept in the 
workspace. Like numeric, there are several command to allocate objects for 
the different data types available in R, i.e. we have functions like integer, 
character, etc. or use the function vector as follows: 

R> vector(mode = "numeric", 4) 

[ 1 ] 0 0 0 0 

is equivalent to numeric (4) . Objects can also have length zero 

R> wl <- numeric(0) 

R> wl 

numeric(0) 

or can be initialized as null 

R> w2 <- NULL 
R> w2 

NULL 

which is useful if one wants to enlarge these objects later in subsequent tasks. 
In the above, wl and w 2 are objects of different types and, in particular, wl is an 
object of class numeric while w2 is not. It is also possible to use the command 
assign to create objects and this is sometimes useful when the name of the 
object has to be created dynamically in the R code. The following is an example 
of use in which oi is created as before and 02 is created via assign 

R> 01 <- 1:4 
R> 01 

[1] 1 2 3 4 
R> ls() 

[1] "01" "wl" "w2" "x" "y" "z" 

R> assign("02", 5:8) 

R> Is () 

[1] "01" "02" "wl" "w2" "x" "y" "z" 

R> 02 


[1] 5 6 7 8 
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A.2.2 Basic object types 

Objects classes can be created from scratch in the R language, and this is usually 
the case for many R packages, but the basic classes are integer, numeric, 
complex, character, etc. which can be aggregated in vectors, matrixes, arrays or 
lists. While vectors, arrays and lists contain elements all of the same type, the lists 
are more general and can contain objects of different size and type but in addition 
can also be nested. For example, the following code loads a data set and estimates 
a linear model via lm and assign the result to an object mod. The statistical analysis 
per se is not relevant here, we just notice that an estimated regression model in 
R is not just an output of coefficients with their significance, but an object 

R> data(cars) 

R> mod <- lmfdist ~ speed, data = cars) 

R> mod 

Call: 

lm(formula = dist ~ speed, data = cars) 

Coefficients: 

(Intercept) speed 

-17.579 3.932 

We now look at the structure of the object created by the linear regression using 
the command str which inspects the structure of the object. 


R> str(mod) 


List of 12 

coefficients 


Named num [1:2] -17.58 3.93 


'names' 1 )= chr [1:2] 

: Named num [1:50] 
'names")= chr [1:50] 
: Named num [1:50] 

’names")= chr [1:50] 
: int 2 


.- attr( 

residuals : Named num [1:50] 3.85 11.85 -5.95 12.05 2.12 
.- attr(*, 
effects 
9.885 0.194 
.- attr(*, 
rank 

fitted.values: Named num [1:50] -1.85 -1.85 9.95 9.95 13.88 


(Intercept)" "speed" 

3.85 11.85 -5.95 12.05 2. 
" 1" " 2 " " 3 " ” 4 " ... 
-303.914 145.552 -8.115 

"(Intercept)" "speed" "" 



attr(* 

"names")= chr 

[1:50] "1" 

" 2" " 

assign 

: int [1:2] 

0 1 


qr 


:List of 5 




qr : 

num [1:50, 1:2] 

-7.071 0. 

141 0. 


.- attr(*, "dimnames" 

=List of 2 



. ..$ 

chr [1:50] "1 

" 2 " .. 3 .. .. 

4" ... 


. . .$ 

chr [1:2] "(Intercept)" 

"speed 


.- attr(*, "assign")= 

int [1:2] 

0 1 


qraux: 

num [1:2] 1.14 

1.27 



pivot: 

int [1:2] 1 2 




tol : 

num le-07 




rank : 

int 2 




attr(* 

"class")= chr 

" qr" 
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df.residual 
xlevels 
call 
terms 


int 48 
list() 

language lm(formula = dist ~ speed, data = cars) 
Classes 'terms', 'formula' length 3 dist ~ speed 


- attr(*. 

"variables")= 

language list(dist 

, speed) 

- attr(*. 

"factors")= int [1:2, 1] 0 1 


..- attri 

*, "dimnames" 

)=List of 2 


.. . .$ : 

chr [1:2] "dist" "speed" 


.. . .$ : 

chr "speed" 



- attr(*. 

"term.labels" 

)= chr "speed" 


- attr(*. 

"order")= int 

1 


- attr(*. 

"intercept")= 

int 1 


- attr(*. 

"response")= 

int 1 


- attr(*. 

".Environment 

")=<environment: R_ 

GlobalEnv> 

- attr(*. 

"predvars")= 

language list(dist. 

speed) 

- attr(*. 

"dataClasses" 

)= Named chr [1:2] 

"numeric" 


numeric" 

.. attr(*, "names")= chr [1:2] "dist" "speed" 

model :'data.frame': 50 obs. of 2 variables: 

$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ... 

$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ... 

- attr(*, "terms")=Classes 'terms', 'formula' length 3 
dist ~ speed 


- attr(*. 

"variables")= language list(dist, 

speed) 

- attr(*, 

"factors")= int [1:2, 1] 0 1 


..- attr( 

*, "dimnames")=List of 2 


.. ..$ : 

chr [1:2] "dist" "speed" 


.. ..$ : 

chr "speed" 


- attr(*, 

"term.labels")= chr "speed" 


- attr(*, 

"order")= int 1 


- attr(*. 

"intercept")= int 1 


- attr(*. 

"response")= int 1 


- attr(*, 

".Environment")=<environment: R_GlobalEnv> 

- attr(*. 

"predvars")= language list(dist, 

speed) 

- attr(*. 

"dataClasses")= Named chr [1:2] ' 

numeric" 


"numeric" 

.- attr(*, "names")= chr [1:2] "dist" "speed" 

attr(*, "class")= chr "lm" 


For the above we see that mod is essentially a list object of 12 elements 
and it is of class ‘lm’ (for linear models). For example, the first one is called 
coefficients and can be accessed using the symbol $ as follows: 

R> mod$coefficients 


(Intercept) 

-17.579095 


speed 

3.932409 


R> str (mod$coeffidents) 

Named num [1:2] -17.58 3.93 

- attr(*, "names")= chr [1:2] "(Intercept)" "speed 
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The vector coefficients is a ‘named vector’. One can obtain or change the 
names of the elements of a vector with 

R> names(mod$coefficients) 

[1] "(Intercept)" "speed" 

or change them with 


R> names(mod$coefficients) <- c("alpha", "beta") 

R> mod$coefficients 

alpha beta 

-17.579095 3.932409 

Similarly, one can assign or get the names of the rows or the columns of an 
R matrix 


R> z 



[, 

1] 

[,2] 

[, 3] [,4] 

[ ,5] 

[, 6] 

[1, 

] 

1 

6 

11 16 

21 

26 

[2, 

] 

2 

7 

12 17 

22 

27 

[3, 

] 

3 

8 

13 18 

23 

28 

[4, 

] 

4 

9 

14 19 

24 

29 

[5, 

] 

5 

10 

15 20 

25 

30 

R> 

rownames (z) 

<- c("a". 

"b". 

"c ", 

R> 

colnames(z) 

<- c ( "A ", 

"B ", 

"C", 

R> 

z 






A 

B 

C 

D E 

F 



a 1 

6 

11 

16 21 

26 



b 2 

7 

12 

17 22 

27 



c 3 

8 

13 

18 23 

28 



d 4 

9 

14 

19 24 

29 



e 5 

10 

15 

20 25 

30 




As anticipated, lists can be nested. For example, the object model inside mod is 
itself a list 

R> str(mod$model) 

'data.frame 1 : 50 obs. of 2 variables: 

$ dist : num 2 10 4 22 16 10 18 26 34 17 ... 

$ speed: num 4 4 7 7 8 9 10 10 10 11 ... 

- attr(*, "terms")=Classes 'terms', 'formula' length 3 
dist ~ speed 

.. ..- attr(*, "variables")= language list(dist, speed) 

.. ..- attr(*, "factors")= int [1:2, 1] 0 1 

.- attr(*, "dimnames")=List of 2 

.$ : chr [1:2] "dist" "speed" 
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.$ : chr "speed" 


..- attr(*, 

"term.labels" 

)= chr "speed" 


..- attr(*, 

"order")= int 

1 


..- attr(*, 

"intercept")= 

int 1 


..- attr(*, 

"response")= 

int 1 


..- attr(*, 

".Environment 

")=<environment: R 

_GlobalEnv> 

..- attr(*, 

"predvars")= 

language list(dist 

, speed) 

..- attr(*, 

"dataClasses" 

)= Named chr [1:2] 

"numeric" 

numeric" 

.. ..- attr( 

*, "names")= 

chr [1:2] "dist" " 

speed" 


or, more precisely a data. frame which is essentially a list with the property 
that all the elements have the same length. The data, frame object is used to 
store data sets, like the cars data set 


R> str(cars) 

'data.frame': 50 obs. of 2 variables: 

$ speed: num 4 4 7 7 8 9 10 10 10 11 ... 

$ dist : num 2 10 4 22 16 10 18 26 34 17 ... 

and it is assumed that the elements of a data.frame correspond to variables, 
while the length of each object is the same as the sample size. 

A.2.3 Accessing objects and subsetting 

We have seen that $ can be used to access the elements of a list and hence of 
a data. frame, but R also offer operators for enhanced subsetting. The first one 
is [ which returns an object of the same type of the original object 

R> y 

[1] 2 7 4 1 
R> y[2:3] 

[1] 7 4 
R> str (y) 
num [1:4] 2 7 4 1 
R> str (y [2:3]) 
num [1:2] 7 4 
or, for matrix-like objects 

R> z 


A B C D E F 
a 1 6 11 16 21 26 
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b 

2 

7 

12 

17 

22 

27 

c 

3 

8 

13 

18 

23 

28 

d 

4 

9 

14 

19 

24 

29 

e 

5 

10 

15 

20 

25 

30 

R> 

z[l 

:2, 

5: 

6] 



E F 
a 21 26 
b 22 27 


and subsetting can occur also on nonconsecutive indexes 

R> z [1:2, c (1, 3, 6) ] 


A C F 
a 1 11 26 
b 2 12 27 

or in different order 

R> z [1:2, c (6, 5, 4) ] 


FED 
a 26 21 16 
b 27 22 17 

One can subset objects also using names, e.g. 

R> z[c("a", "c"), "D"] 

a c 
16 18 

We can also use a syntax like 

R> z["c", ] 

A B C D E F 
3 8 13 18 23 28 

leaving one argument out to mean ‘run all the elements’ for that index. Further, 
R allows for negative indexes which are used to exclude indexes 

R> z[c (—If -3), ] 

A B C D E F 

b 2 7 12 17 22 27 

d 4 9 14 19 24 29 

e 5 10 15 20 25 30 

but positive and negative indexes cannot be mixed. 
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The subsetting operator [ also works for lists 


R> a <- mod[l:2] 

R> str(a) 

List of 2 

$ coefficients: Named num [1:2] -17.58 3.93 
..- attr(*, "names")= chr [1:2] "alpha" "beta" 

$ residuals : Named num [1:50] 3.85 11.85 -5.95 12.05 2.12 ... 

..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ... 

where we have extracted the first two elements of the list mod using mod [ l: 2 ]. 
We can use names as well and the commands below return the same objects 


R> str (mod["coefficients"]) 

List of 1 

$ coefficients: Named num [1:2] -17.58 3.93 
..- attr(*, "names")= chr [1:2] "alpha" "beta" 

R> str(mod[1]) 

List of 1 

$ coefficients: Named num [1:2] -17.58 3.93 
..- attr(*, "names")= chr [1:2] "alpha" "beta" 

Notice that mod[i] returns a list with one element but not just the element 
inside the list. For this purpose one should use the subsetting operator [ [. The 
next group of commands returns the element inside the list 

R> str (mod[ [1] ] ) 

Named num [1:2] -17.58 3.93 

- attr(*, "names")= chr [1:2] "alpha" "beta" 

R> str(mod[["coefficients"]]) 

Named num [1:2] -17.58 3.93 

- attr(*, "names")= chr [1:2] "alpha" "beta" 

R> str (mod$coeffidents) 

Named num [1:2] -17.58 3.93 

- attr(*, "names")= chr [1:2] "alpha" "beta" 

We have mentioned that a data, frame looks like a particular list, but with more 
structure and used to store data sets. The latter are always thought as matrixes and 
indeed it is possible to access the elements of a data, frame using the subsetting 
rules for matrixes, i.e. 


R> cars[, 1] 


404 


APPENDIX A 


[1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 

14 14 14 15 15 

[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 
24 24 24 24 25 

which is equivalent to the following 

R> cars$speed 

[1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 

14 14 14 15 15 

[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 
24 24 24 24 25 

R> cars[, 1] 

[1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 

14 14 14 15 15 

[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 
24 24 24 24 25 

Notice that only the output is not a data, frame while 2 
R> str(cars[1]) 

'data.frame': 50 obs. of 1 variable: 

$ speed: num 4 4 7 7 8 9 10 10 10 11 ... 

R> head(cars[1]) 

speed 

1 4 

2 4 

3 7 

4 7 

5 8 

6 9 

is a proper (sub) data.frame although the matrix-like subsetting operator as a 
different behaviour if used on columns or rows: cars[,i] returns the element 
but, for example, 

R> cars [1:3, ] 

speed dist 
14 2 

2 4 10 

3 7 4 

returns a data, frame with the selected number of rows and all columns. 


2 The commands head and tail show the first and last rows of a data. frame respectively. 
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A.2.4 Coercion between data types 

Functions like names, colnames, but also levels, attributes, etc. are used to 
retrieve and set properties of objects and are called accessor functions. Objects 
can be transformed from one type to another using functions with names as.*. 
For example, as. integer transforms an object into an integer whenever possible 
or eventually return a missing value 

R> pi 

[1] 3.141593 
R> as.integer(pi) 

[1] 3 

R> as.integer("3.14") 

[1] 3 

R> as.integer("a") 

[1] NA 

Other examples are as .data, frame to transform matrix objects into true 
data, frame objects and vice versa. For more complex classes one can also try 
the generic function as. 


A.3 S4 objects 

We have used several times the term ‘class’ for R objects. This is because each 
object in R belongs to some class and for each class there exist generic functions 
called methods which perform some task on that object. For example, the function 
summary provide summary statistics which are appropriate for some object 


R> summary(cars) 


speed dist 


Min. 

4.0 

Min. 

2.00 

1st Qu. 

12.0 

1st Qu. 

26.00 

Median 

15.0 

Median 

36.00 

Mean 

15.4 

Mean 

42.98 

3rd Qu. 

19.0 

3rd Qu. 

56.00 

Max. 

25.0 

Max. 

120.00 


R> summary(mod) 

Call: 

lm(formula = dist ~ speed, data = cars) 


Residuals: 
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Min IQ Median 3Q Max 

-29.069 -9.525 -2.272 9.215 43.201 

Coefficients: 

Estimate Std. Error t value Pr(>|t|) 
alpha -17.5791 6.7584 -2.601 0.0123 * 

beta 3.9324 0.4155 9.464 1.49e-12 *** 

Signif. codes: 0 '***' 0.001 '**' 0.01 0.05 0.1 ' ' 1 

Residual standard error: 15.38 on 48 degrees of freedom 
Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438 
F-statistic: 89.57 on 1 and 48 DF, p-value: 1.490e-12 

The standard set of classes and methods in R is called S3. In this framework, a 
method for an object of some class is simply an R function named method, class, 
e.g. summary. lm is the function which is called by R when the function summary 

is called with an argument which is an object of class lm. In R methods like sum¬ 

mary are very generic and the function methods provides a list of specific methods 
(which apply to specific types of objects) for some particular method. For example 

R> methods(summary) 


[1] 

summary.aov 

summary.aovlis t 

summary.aspell* 

[4] 

summary.connection 

summary.data.frame 

summary.Date 

[7] 

summary.default 

summary.ecdf * 

summary.factor 

[10] 

summary.glm 

summary.inf1 

summary.lm 

[13] 

summary.loess* 

summary.manova 

summary.matrix 

[16] 

summary.mlm 

summary.nls* 

summary. 

packagestatus 

[19] 

summary.POSIXct 

summary.POSIXlt 

summary.ppr* 

[22] 

summary.prcomp* 

summary.princomp* 

summary.stepfun 

[25] 

summary.stl* 

summary.table 

summary. 

tukeysmooth* 


Non-visible functions are asterisked 


The dot naming convention is quite unhappy because one can artificially 
create functions which are not proper methods, for example the t. test function 
is not the method t for objects of class test but it is just an R function 
which performs ordinary two-samples t test. The new system of classes and 
methods which is now fully implemented in R is called S4. Objects of class S4 
apparently behave like all other objects in R but they possess properties called 
‘slots’, which can be accessed differently from other R objects. The next code 
estimates the maximum likelihood estimator for the mean of a Gaussian law. It 
uses the function mie from the package stats4 which is an S4 package as the 
name suggests. Again, we are interested in the statistical part of this example 

R> require(stats4) 

R> set.seed(123) 
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R> y <- rnorm(100, mean = 1.5) 

R> f <- function(theta = 0) -sum(dnorm(x = y, mean = theta, 
log = TRUE)) 

R> fit <- mle(f) 

R> fit 

Call: 

mle(minuslogl = f) 

Coefficients: 

theta 

1.590406 

we now have a look at the object fit returned by the mle function 

R> str(fit) 

Formal class 'mle' [package "stats4"] with 8 slots 
..0 call : language mle(minuslogl = f) 

..0 coef : Named num 1.59 

.. ..- attr(*, "names")= chr "theta" 

..0 fullcoef : Named num 1.59 
.. ..- attr(*, "names")= chr "theta" 

..0 vcov : num [1, 1] 0.01 

.. ..- attr(*, "dimnames")=List of 2 

.$ : chr "theta" 

.$ : chr "theta" 

..0 min : num 133 

..0 details :List of 6 

.. ..$ par : Named num 1.59 

.- attr(*, "names")= chr "theta" 

.. ..$ value : num 133 

.. ..$ counts : Named int [1:2] 6 3 

.- attr(*, "names")= chr [1:2] "function" "gradient" 

.. ..$ convergence: int 0 

.. ..$ message : NULL 

.. ..$ hessian : num [1, 1] 100 

.- attr(*, "dimnames")=List of 2 

.$ : chr "theta" 

.$ : chr "theta" 

..0 minuslogl:function (theta = 0) 

..0 method : chr "BFGS" 

We now see that this is an S4 objects with slots that, as the structure suggests, 
can be accessed using the symbol @ instead of $. For example, 

R> fitScoef 

theta 

1.590406 

To get the list of methods for S4 objects one should use the function showMethods 
R> showMethods(summary) 
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Function: summary (package base) 
obj ect="ANY" 
object="mle" 


A.4 Functions 

In the previous section we have created a new function called f to deline the 
log-likelihood of the data. In R functions are created with the command function 
followed by a list of arguments and the body of the function (if longer than one 
line) has to be contained within and *j’ like in the next example in which 
we define the payoff function of a call option 

R> g <- functionfx, K - 110) { 

+ max (x - K, 0) 

+ } 

The function returns the last calculation unless the command return is used. 
By default, in the function g we have set the strike price k = loo and x is the 
argument which represents the price of the underlying asset. 

R> g (120) 

[ 1 ] 10 

R> g (99) 

[ 1 ] 0 

R> g(115, 120) 

[1] 0 

In R arguments are always named, so the function can be called with arguments 
in any order if named, e.g. 

R> g (150, 120) 

[1] 30 

R> g(K = 120, x = 150) 

[1] 30 

In the definition of g we have fixed a default value for the argument k to loo, 
so if it is missing in a call, it is replaced by R with its default value. The 
argument x cannot be omitted, therefore a call like g(K = 12 0) will produce 


an error. 


A.5 Vectorization 
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Most of R functions are vectorized, which means that if a vector is passed to a 
function, the function is applied to each element of the function and a vector of 
results is returned as in the next example: 

R> set.seed(123) 

R> x <- runif(5, 90, 150) 

R> x 

[1] 107.2547 137.2983 114.5386 142.9810 146.4280 
R> sin(x) 

[1] 0.4263927 -0.8026760 0.9916244 -0.9992559 0.9414204 

But functions should be prepared to be vectorized. For example, our function g 
is not vectorized: 

R> g(x) 

[1] 36.42804 

Indeed, in the body of g the function max is used and it operates as follows: first 
x-k is calculated: 

R> x - 100 

[1] 7.254651 37.298308 14.538615 42.981044 46.428037 

and then the max calculates the maximum of the vector c (x-ioo , 0) . To vectorize 
it we can use the function sapply as follows: 

R> gl <- function (x, K = 110) { 

+ sapplyfx, function(x) max(x - K, 0)) 

+ } 

R> x 

[1] 107.2547 137.2983 114.5386 142.9810 146.4280 
R> gl (x) 

[1] 0.000000 27.298308 4.538615 32.981044 36.428037 

and we get five different payoffs. The functions of class * apply are designed to 
work iteratively on different objects. The function sappiy iterates the vector in 
the first argument and applies the functions in the second argument. The function 
apply works on arrays (e.g. matrixes), lappiy iterates over list’s, etc. 
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The usual for and while constructs exist in R as well, but their use should 
be limited to real iterative tasks which cannot be parallelized as in our example. 
A for version of the function g can be the following: 

R> g2 <- function(x, K = 110) { 

+ n <- length(x) 

+ val <- numeric(n) 

+ for (i in l:n) val[i] <- max(x[i] - K, 0) 

+ val 

+ } 

or, in a more R -like fashion, as follows: 

R> g3 <- function(x, K - 110) { 

+ val <- NULL 

+ for (u in x) val <- c(val, max(u - K, 0)) 

+ val 

+ } 

R> gl (x) 

[1] 0.000000 27.298308 4.538615 32.981044 36.428037 

R> g2 (x) 

[1] 0.000000 27.298308 4.538615 32.981044 36.428037 

R> g3 (x) 

[1] 0.000000 27.298308 4.538615 32.981044 36.428037 

The vectorized versions are usually faster then the ones iterated using for 
loops: 


R> y <- runif(10000, 90, 150) 

R> system.time(gl(y)) 

user system elapsed 
0.034 0.001 0.035 

R> system.time(g2(y)) 

user system elapsed 
0.051 0.003 0.054 

R> system, time (g3 (y)) 

user system elapsed 
0.261 0.068 0.344 

Notice that the function g3 is particularly inefficient because instead of allocating 
and assigning the results, it grows the vector val dynamically. 


'HOW TO' GUIDE TO R 


411 


A.6 Parallel computing in R 

There are several options for parallel computing in R . As usual, each solution has 
pro and cons, and here we present some options which appear to be the simplest 
to be used. The first option for the casual user is the package snow (Simple 
Network of Workstations) by Luke Tierney. This package allows the creation 
of cross-platform clusters (i.e., the nodes of the cluster may be on different 
platforms) very easily. The very first application of the package is to exploit 
the power of today’s CPUs which are usually multicore. We assume a dual core 
machine in the next example, and start a cluster with two nodes, one for each 
core in our CPU. 

R> library(snow) 

R> cl <- makeSOCKcluster(c("localhost", "localhost")) 


The cluster has been created over sockets (which are interprocess communica¬ 
tion between tasks in operating systems) hence, not particularly efficient and 
localhost means ‘this machine’. It is also possible to start a cluster with 

R> makeCluster(2) 


or 


R> makeCluster (2, type = "SOCK") 

in an interactive session because, usually, the sockets connection is the standard 
connection type, so all the above are essentially equivalent. The next code splits 
the replications of 100 simulated paths of an Ornstein-Uhlembeck process on 
the two nodes. For this we prepare a function f which simulates a number x of 
trajectories on each node 

R> f <- function(x) { 

+ require(sde) 

+ sde.sim(model = "OU”, theta - c(l, 1, 1), M = x) 

+ } 

and apply this function to the nodes of the cluster in this way: 

R> tmp <- parLapply(cl, c(50, 50), f) 

R> str(tmp) 

List of 2 

$ : mts [1:101, 1:50] 1 1.136 1.038 0.865 0.724 ... 

..- attr(*, "dimnames")=List of 2 
.. ..$ : NULL 

.. ..$ : chr [1:50] "XI" "X2" "X3" "X4" ... 

..- attr(*, "tsp")= num [1:3] 0 1 100 
..- attr(*, "class")= chr [1:2] "mts" "ts" 

$ : mts [1:101, 1:50] 1 0.861 0.693 0.653 0.705 ... 
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attr(*, "dimnames")=List of 2 
.. ..$ : NULL 

.. ..$ : chr [1:50] "XI" "X2" "X3" "X4" . 
attr(*, "tsp")= num [1:3] 0 1 100 
attr(*, "class")= chr [1:2] "mts" "ts 

and we should not forget to stop the cluster 

R> stopCluster(cl) 

Compare the result with a single call 


R> tmp <- f(100) 

To check the errata corrige of the book, type 
vignette("sde.errata") 

R> str(tmp) 

mts [1:101, 1:100] 1 0.963 0.88 0.93 0.912 ... 

- attr(*, "dimnames")=List of 2 
. .$ : NULL 

..$ : chr [1:100] "XI" "X2" "X3" "X4" ... 

- attr(*, "tsp")= num [1:3] 0 1 100 

- attr(*, "class")= chr [1:2] "mts" "ts" 

When doing parallel computing it is necessary to take care that the random num¬ 
ber generators provide independent streams of numbers on each node, otherwise 
a typical Monte Carlo analysis can be invalidated. The package riecuyer can 
provide such functionality to R and can be used directly in a cluster created with 
snow via the function ciusterSetupRNG. The following is an example of use: 

R> cl <- makeSOCKcluster(c("localhost", "localhost")) 

R> ciusterSetupRNG(cl, seed = rep(123, 2)) 

[1] "RNGstream" 

R> tmp <- parLapply(cl, c(50, 50), f) 

R> stopCluster(cl) 

The socket method is not very efficient if there is the need to pass big amount 
of data. The package snow is able to run smoothly clusters using other meth¬ 
ods like MPI (Message-Passing Interface) via the package Rmpi which supports 
MPICH, MPICH2, LAM-MPI, Deino MPI and Open MPI, or PVM (Parallel 
Virtual Machine) via the package rpvm or the NWS (NetWorkspaces) via the 
nws package. All these options are more powerful than socket connections, but 
require a bit of fine tuning of the hardware which cannot be discussed here. 
More details can be found in Rossini et cil. (2007, 2008). The packages Rmpi, 
rpvm and nws all work independently of the package snow. The package snow¬ 
fall although based on snow, has some additional functionalities which make 
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scripting of parallelized R code easier and also allow us to write the same code 
with or without parallelization using a simple switch. It is also better in that 
it creates global variables that are passed through all nodes when the cluster 
is started and offer more effective debug tools. We provide an example of use 
of previous code. The cluster is started and stopped using sfmit and sfstop 
respectively. 


R> require(snowfall) 

R> sflnit(parallel = TRUE, cpus = 2) 

R Version: R version 2.10.1 (2009-12-14) 

R> c!2 <- sfGetClusterf) 

R> clusterSetupRNG(cl2, seed = rep(123, 2)) 

[1] "RNGstream" 

R> tmp <- parLapply (cl2, c(50, 50), f) 

R> sfStop() 

R> str(tmp) 

List of 2 

$ : mts [1:101, 1:50] 1 0.985 1.044 1.118 1.104 ... 

..- attr(*, "dimnames")=List of 2 
.. ..$ : NULL 

.. ..$ : chr [1:50] "XI" "X2" "X3" "X4" ... 

..- attr(*, "tsp")= num [1:3] 0 1 100 
..- attr(*, "class")= chr [1:2] "mts" "ts" 

$ : mts [1:101, 1:50] 1 1.052 1.001 0.994 0.983 ... 

..- attr(*, "dimnames")=List of 2 
.. ..$ : NULL 

.. ..$ : chr [1:50] "XI" "X2" "X3" "X4" ... 

..- attr(*, "tsp")= num [1:3] 0 1 100 
..- attr(*, "class")= chr [1:2] "mts" "ts" 

Notice that the code to execute commands on the node of the cluster is the same 
as in snow. Other packages implement implicit parallelization in which the code 
is automatically distributed to the different nodes of the cluster (or different CPUs 
like the multicore package). For an updated review we suggest you look at the 
Task View on cran named ‘HighPerformanceComputing’. 


A.6.1 The foreach approach 

A particular attention merits the foreach package. This package automatically 
distributes the parallelized tasks to the nodes of a cluster in a way that makes 
writing of the code quite simple. The advantage is also that, when the cluster 
is not available, the code still works but runs sequentially. We start by 
writing a simple code which makes use of the foreach command and the 
dopar operator 
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R> require(foreach) 

R> set.seed(123) 

R> tmp <- foreach(i = rep(50, 2)) %dopar% f(i) 

R> str(tmp) 

List of 2 

$ : mts [1:101, 1:50] 1 0.944 0.97 0.9 0.979 ... 

..- attr(*, "dimnames")=List of 2 
.. ..$ : NULL 

.. ..$ : chr [1:50] "XI" "X2" "X3" "X4" ... 

..- attr(*, "tsp")= num [1:3] 0 1 100 
..- attr(*, "class")= chr [1:2] "mts" "ts" 

$ : mts [1:101, 1:50] 1 0.951 0.69 0.718 0.658 ... 

..- attr(*, "dimnames")=List of 2 
.. ..$ : NULL 

.. ..$ : chr [1:50] "XI" "X2" "X3" "X4" ... 

..- attr(*, "tsp")= num [1:3] 0 1 100 
..- attr(*, "class")= chr [1:2] "mts" "ts" 

R> getDoParWorkers() 

[1] 1 

in the above case, the foreach command executes several times the function f 
with argument i varying in the set c (50,50). So it calls, sequentially two times 
f. The function getDoParworkers tells us the number of nodes in the cluster. 
With the foreach approach, a cluster can be a rather generic one as in the snow 
package. As before, we first set up a cluster using the snowfall package: 


R> require(snowfall) 

R> sflnit(parallel - TRUE, cpus = 2) 

R> c!2 <- sfGetClusterO 

R> clusterSetupRNG(cl2, seed = rep(123, 2)) 

[1] "RNGstream" 

Then, we load the foreach package and and doSNOW package which is used to 
inform the foreach package which parallel back end should be used by dopar. 
This is done using the function registerDoSNOW 


R> require(foreach) 

R> require(doSNOW) 

R> registerDoSNOW(cl2) 


For other cluster structures of back ends, the user need to provide their registration 
functions. Fortunately, there are already several ready to use solutions. We will 
show an example later. Let us continue and make use of the foreach functionality 
with the new parallel back end 

R> tmp <- foreach(i = rep(50, 2)) %dopar% f(i) 

R> str(tmp) 




'HOW TO' GUIDE TO R 


415 


List of 2 

$ : mts [1:101, 1:50] 1 0.985 1.044 1.118 1.104 ... 

..- attr(*, "dimnames")=List of 2 
.. ..$ : NULL 

.. ..$ : chr [1:50] "XI" "X2" "X3" "X4" ... 

..- attr(*, "tsp")= num [1:3] 0 1 100 
..- attr(*, "class")= chr [1:2] "mts" "ts" 

$ : mts [1:101, 1:50] 1 1.052 1.001 0.994 0.983 ... 

..- attr(*, "dimnames")=List of 2 
.. ..$ : NULL 

.. ..$ : chr [1:50] "XI" "X2" "X3" "X4" ... 

..- attr(*, "tsp")= num [1:3] 0 1 100 
..- attr(*, "class")= chr [1:2] "mts" "ts" 

R> getDoParWorkers() 

[1] 2 

Now we need to stop the cluster with sfstop but also inform foreach that 
parallel executing is no longer possible. We do this by registering the sequential 
back end with command registerDoSEQ 

R> sfStop() 

R> registerDoSEQ() 

R> getDoParWorkers(> 

[1] 1 

Now, if we run again the foreach statement, it will be executed sequentially. 

R> tmp <- foreach(i = rep(50, 2)) %dopar% f(i) 

A similar approach can be done using a cluster created with the multicore pack¬ 
age along with the doMC package to teach foreach that a new cluster is in 
place: 


R> require(doMC) 

R> registerDoMC () 

R> options(cores = 2) 

R> tmp <- foreach (i = rep (50, 2)) %dopar% f(i) 

R> str(tmp) 

List of 2 

$ : mts [1:101, 1:50] 1 0.878 0.855 0.724 0.798 ... 
..- attr(*, "dimnames")=List of 2 
.. ..$ : NULL 

.. ..$ : chr [1:50] "XI" "X2" "X3" "X4" ... 

..- attr(*, "tsp")= num [1:3] 0 1 100 
..- attr(*, "class")= chr [1:2] "mts" "ts" 

$ : mts [1:101, 1:50] 1 1.136 1.03 0.989 0.903 ... 
..- attr(*, "dimnames")=List of 2 
.. ..$ : NULL 
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.. ..$ : chr [1:50] "XI" "X2" "X3" "X4" ... 

..- attr(*, "tsp")= num [1:3] 0 1 100 
..- attr(*, "class")= chr [1:2] "mts" "ts" 

R> getDoParWorkers() 

[ 1 ] 2 

R> registerDoSEQ() 

R> getDoParWorkers() 

[ 1 ] 1 

Notice that all the above codes have in common the for each statement. So 
writing code in this way makes the software ready to run against most cluster 
structures out of the box, provided we have prepared it correctly. 

A.6.2 A note of warning on the multicore package 

The multicore package uses the ‘fork’ system call to spawn a copy of the current 
process which performs the computations in parallel. Modern operating systems 
use the copy-on-write approach which makes this very appealing for parallel 
computation since only objects modified during the computation will be actually 
copied and all other memory is directly shared. 

However, the copy shares everything including any user interface elements 
(windows, menus, etc.). This may cause the above example to execute rather 
slowly if you try it from the R GUI. So the preferred use of multicore package 
is in command line scripts. This appears a trivial statement because intensive 
computation is usually done in a batch environment, but it is still good to know 
if you don’t want to be disappointed by using multicore package in the wrong 
environment. 


A.7 Bibliographical notes 

There are many basic books apart from the one mentioned earlier (Dalgaard 
2008), such as Crawley (2007), which cover the basic functionalities of the R 
language. A simple search with the keyword R in on-line book stores will return 
hundreds of titles. For advanced programming techniques on the standard S 
language we should mention Chambers (2004) and Venables and Ripley (2000). 
For S4 programming some recent references are Chambers (2008) and Gentleman 
(2008). For advanced graphics one should not miss the books of Murrel (2005) 
and Deepayan (2008). 
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Appendix B 

R in finance 


Finance includes many subfields. In this book we considered only option pricing, 
econometric estimation and simulation of financial models and analysis of finan¬ 
cial data. But finance also includes important fields like trading and portfolio opti¬ 
mization. The family of R packages offers several opportunities in this direction. 

For example, the fPortfolio package from Rmetrics implements Markowitz 
Portfolio Theory, Mean-Variance Frontiers, Mean-CVaR, etc., but also offers 
a framework for backtesting analysis. The fPortfolio package comes with an 
additional ebook Wiirtz el al. (2009), which is worth reading. The package portfolio 
focuses on equity portfolio strategies and also implements matching portfolios 
for benchmark comparisons. Another interesting solution in this direction is the 
backtest package. We should also mention PerformanceAnalytics which is a 
library of functions designed for evaluating the performance and risk characteristics 
of financial assets or funds. Another growing library of packages is dedicated 
to trading. We mention just a few: fTrading for basic trading analysis; TTR to 
construct technical trading rules and ttrTests for testing these rules; IBrokers 
which is a set of API to interact with the Interactive Brokers Trader Workstation. 
For risk management analysis, apart from a combination of the above mentioned 
packages, one can check the VaR package for Value-at-Risk analysis and the Cred- 
itMetrics which implements the CreditMetrics risk model functionalities. For more 
information on other R tools for finance, we suggest looking at the TaskView on 
Finance available at http://cran.r-project.org/web/views/Finance.html. 
We now focus on what is strictly related to this book. 

B.l Overview of existing R frameworks 

Although the R community has moved only recently to finance, there is a grow¬ 
ing number of tools appearing every day. We briefly describe here two large 
collections which originate from different approaches. 


Option Pricing and Estimation of Financial Models with R, First Edition. Stefano M. Iacus. 

© 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd. ISBN: 978-0-470-74584-7 
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B.1.1 Rmetrics 

The first large suite of tools is the Rmetrics library which we have already used 
several times in the book, for example in Chapters 6 and 7. The Rmetrics project 
is now supported by the nonprofit Rmetrics Association which also publishes 
several ebooks on, but not limited to, finance. The packages of this collection 
were initially designed for teaching purposes but later evolved in the direction 
of being a more operational suite of tools. Of direct interest to the reader of this 
book are the packages: 

• fBasics: which is a collection of functions to explore and investigate basic 
properties of financial returns and related quantities. The fields covered 
include techniques of explorative data analysis and the investigation of 
distributional properties, including parameter estimation and hypothesis 
testing; 

• fOptions: a library of function for the pricing of basic European and Amer¬ 
ican put and call options; 

• fAsianOptions: includes different approximation methods for the pricing 
of Asian option in the Black and Scholes model; 

• fExoticOptions: standard Asian option pricing, as well as barrier options, 
binary options, lookback options, etc. 

• timeDate: a framework for chronological and calendar objects which we 
will discuss below. 

This small list is only a part of the suite specifically focused on option pricing. 

B.1.2 RQuantLib 

The RQuantLib is an R interface to the QuantLib C++ library from the homony¬ 
mous project. The open source QuantLib project 1 aims to provide a comprehensive 
software framework for quantitative finance. It is a low level framework written in 
C++. The loading of the RQuantLib package for the average R user is not easy, 
because it requires a working GCC compiler, the preliminary installation of the 
boost framework and finally the installation of the quant lib library. These steps 
are made easier for users of Debian OS, but in general they require the usual skills 
of installing applications from source code. For this reason, we haven’t made use 
of these functionalities in this book. Nevertheless, we think it is worth signaling the 
RQuantLib package because it contains several interesting functionalities, in par¬ 
ticular for European, American and Asian option pricing as well as implied volatility 
analysis. For more informations, we suggest checking the developer’s web page at 
http://dirk.eddelbuettel.com/code/rquant1ib.html. 


1 http://www.quantlib.org 
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B.1.3 The quantmod package 

The package quantmod is another framework for quick analysis of financial 
data. This package implements several methods for plotting data and importing 
them. It is strictly related to the xts package, discussed later, which is one of 
the time series classes in R and is also related to the TTR package for doing 
on-the-fly calculations of indexes to be plotted on the charts. We will discuss in 
more detail in Section B.5 the data import functionalities but here we mention a 
few of the graphical capabilities. The main function is the chartseries which 
can plot xts object in a very effective way. We have used the basic plot several 
times in this book (see, e.g., Chapter 6). We start by getting the data for the 
AAPL symbol 

R> require(quantmod) 

R> getSymbols("AAPL") 

[1] "AAPL" 

The function getSymbols creates an object of class xts in the R workspace, 
with the same name as the symbol. Figure B.l represents the basic plot obtained 
with the following basic call to chartseries: 

R> chartseries(AAPL, theme = "white", TA = NULL) 

To this graph one can add several features like the Bolliger bands, the volume 
of exchanges, as Figure B.2, generated by the next code, shows. 


AAPL [2007-01-03/2010-11-15] 



300 
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Figure B.l Basic chartseries plot. 
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Figure B.2 Advanced chartSeries plot. 


R> chartSeries (AAPL, theme = "white", TA = "addVo();addBBands(); 
addCCI()") 

Bollinger Bands, which are added using function addBBands, are obtained as 
the present value of the assets plus or minus two (or sometimes the) standard 
deviation of the moving average of the last, say, 20 quotations of the assets. 
The function addBBands can be configured in several ways. The argument ta in 
chartSeries is used to add technical indicators to the plot. There are several 
indexes that can be plotted against a chartSeries plot and the package TTR 
also provide additional functionalities. The user can write his own ta function. 

B.2 Summary of main time series objects in R 

Financial time series in R can be handled in different ways depending on 
their nature. 
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B.2.1 The ts class 

The basic class of time series objects is the ts class. This class has an extension 
called mts for multidimensional times series but, apart from the dimensionality of 
the data, they share the same properties. The ts class is meant for regular time 
series where observations have a given frequency (e.g. 12 for monthly data, 
7 for daily data, etc.) and a given time distance between observations deitat. 
When an object of this class is created, one should specify the start date and/or 
the hnal date end. For example, if we want to create a time series of quarterly 
data starting from the second quarter of 1959 we write something like 

R> X <- ts(l:10, frequency = 4, start = c(1959, 2)) 

R> X 



Qtrl 

Qtr2 

Qtr3 

Qtr4 

1959 


1 

2 

3 

1960 

4 

5 

6 

7 

1961 

8 

9 

10 



If we want to create monthly data starting from July 1954, we write something 
like 


R> set.seed(123) 

R> X <- ts (cumsum (1 + round(rnorm(100), 
+ frequency = 12) 

R> X 




Jan 

Feb 

Mar 

Apr 

May 

19 

54 











19 

55 

10 

.15 

9 

.88 

10 

.19 

10 

.74 

12 

.96 

19 

56 

22 

.29 

22 

.82 

22 

.75 

23 

.53 

23 

.50 

19 

57 

29 

.98 

30 

.68 

32 

.58 

34 

.46 

36 

.28 

19 

58 

42 

.60 

45 

.77 

47 

.98 

47 

.86 

48 

.46 

19 

59 

58 

. 01 

60 

.53 

59 

.98 

61 

.56 

62 

.68 

19 

60 

69 

. 11 

70 

.16 

72 

.08 

75 

.13 

75 

.64 

19 

61 

80 

.65 

81 

.51 

82 

.52 

83 

.91 

84 

.54 

19 

62 

96 

.64 

98 

.19 

99 

.43 

99 

.80 

102 

.16 



Nov 

Dec 







19 

54 

5 

.97 

8 

.69 







19 

55 

21 

.56 

20 

.59 







19 

56 

26 

.30 

28 

.55 







19 

57 

42 

. 08 

42 

.87 







19 

58 

54 

.87 

57 

.24 







19 

59 

66 

.36 

67 

.66 







19 

60 

79 

.69 

79 

.47 







19 

61 

92 

.50 

94 

.65 







19 

62 












2)), start = c(1954, 7), 


Jun 

Jul 

Aug 


Sep 

Oct 



0 

.44 

1 

.21 

3 

.77 

4 

.84 

14 

.32 

15 

.72 

16 

.83 

17 

.27 

20 

.06 

23 

.77 

24 

.14 

23 

.45 

25 

.29 

26 

.44 

37 

.97 

39 

.52 

40 

.46 

41 

.15 

41 

.77 

48 

.99 

50 

.77 

51 

.69 

52 

.94 

53 

.91 

63 

.90 

65 

.28 

65 

.78 

66 

.45 

66 

.43 

74 

.33 

76 

.34 

76 

.63 

76 

.94 

78 

.97 

86 

.18 

86 

.96 

88 

.29 

90 

.39 

91 

.83 

102 

.56 

105 

.75 

108 

.28 

109 

.04 

109 

.01 


There are several accessory functions to obtain information from a ts object. In 
particular, time extracts the vector of time instants of each observation of the 
time series; deitat returns the A t between observations; end and start returns 
initial and final date and frequency the frequency of the time series. 
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R> time (X) [1:10] 

[1] 1954.500 1954.583 1954.667 1954.750 1954.833 1954.917 
1955.000 1955.083 
[9] 1955.167 1955.250 

R> del tat(X) 

[1] 0.08333333 

R> start(X) 

[1] 1954 7 

R> end(X) 

[1] 1962 10 

R> frequency(X) 

[ 1 ] 12 

These accessory functions are also available for the other classes presented below, 
eventually with some specificity. In addition to the accessory functions, one can 
extract a subseries using the window function. For example, if instead of monthly 
data we want to extract quarterly data from x above, we can do the following: 

R> window(X, frequency -4) 



Qtrl 

Qtr2 

Qtr3 

Qtr4 

1954 



0.44 

4.84 

1955 

10.15 

10.74 

15.72 

20.06 

1956 

22.29 

23.53 

24.14 

26.44 

1957 

29.98 

34.46 

39.52 

41.77 

1958 

42.60 

47.86 

50.77 

53.91 

1959 

58.01 

61.56 

65.28 

66.43 

1960 

69.11 

75.13 

76.34 

78.97 

1961 

80.65 

83.91 

86.96 

91.83 

1962 

96.64 

99.80 

105.75 

109.01 


B.2.2 The zoo class 

The zoo class can host time series in a very abstract way. Indeed, zoo objects 
are objects indexed by an abstract set of indexes we can put in relation with the 
set of integer numbers Z (hence the name ‘zoo’). To use zoo, objects are defined 
in the zoo package. When a zoo object is created, if the set of indexes is not 
created, by default an increasing sequence is used. 


R> require(zoo) 

R> X <- zoo (rnorm(lO)) 
R> X 
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1 2 3 4 5 

-0.71040656 0.25688371 -0.24669188 -0.34754260 -0.95161857 

6 7 8 9 10 

-0.04502772 -0.78490447 -1.66794194 -0.38022652 0.91899661 

R> str(X) 

'zoo' series from 1 to 10 

Data: num [1:10] -0.71 0.257 -0.247 -0.348 -0.952 ... 

Index: int [1:10] 1 2 3 4 5 6 7 8 9 10 

To access or modify the indexes one can use either time or, better, index 


R> index (X) 

[1] 1 2 3 4 5 6 7 8 9 10 

The advantage of the zoo indexing is that it can host irregular time series. For 
example, if we generate 10 random times from the exponential distribution we 
can create a time series under Poisson random sampling as follows: 

R> X <- zoo(rnorm(lO), order.by = cumsum(rexp(10, rate = 0.1))) 

R> X 


8.2326 

-1.02412879 

74.1509 

1.84386201 


12.9902 

0.11764660 

74.9925 

-0.65194990 


47.6261 
-0.94747461 
104.5265 
0.23538657 


60.3664 71.1813 

-0.49055744 -0.25609219 
124.209 130.8476 

0.07796085 -0.96185663 


R> str(X) 

'zoo' series from 8.23260500985098 to 130.847554074331 
Data: num [1:10] -1.024 0.118 -0.947 -0.491 -0.256 ... 

Index: num [1:10] 8.23 12.99 47.63 60.37 71.18 ... 

If one wants to use an approach similar to ts to create regularly space time 
series, one should use explicitly the zooreg function: 

R> Xreg <- zooreg (cumsum (1 + round (rnorm (100), 2)), start = c(1954, 
+ 7) , frequency = 12) 

R> time(Xreg)[1:10] 

[1] 1954.500 1954.583 1954.667 1954.750 1954.833 1954.917 
1955.000 1955.083 
[9] 1955.167 1955.250 

It is possible to convert ts object to zoo object without problems, but the con¬ 
trary is possible only if the time series is regularly space, otherwise times are 
completely ruined as in the next example: 

R> Y <- as.ts (X) 

R> time(X) 
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[1] 8.232605 12.990195 47.626143 60.366421 71.181272 

74.150864 

[7] 74.992471 104.526480 124.209003 130.847554 

R> time(Y) 


Time Series: 

Start = 1 
End = 10 
Frequency = 1 

[1] 1 2 3 4 5 6 7 8 9 10 

B.2.3 The xts class 

The xts class, where the V stands for ‘extensible’, extends the functionality of 
the zoo class specifically to handle time and dates. It also extends the object in the 
sense that it allows for the inclusion of meta data like ‘last data update’ or similar. 
It is also written entirely in low level language to gain in speed when accessing 
or subsetting the object which may be slow in some cases for the other classes of 
objects above. The xts is required to use this new class. The xts function does 
not assign an index to the object, so one has to explicitly create the index of times 


R> require(xts) 

R> X <- xts(rnorm(lO), order.by = as.Date(cumsum(rexp(10, 
rate = 0.1)))) 

R> X 


1970-01-16 

1970-01-22 

1970-01-31 

1970-02-04 

1970-02-05 

1970-02-08 

1970-02-21 

1970-03-15 

1970-03-23 

1970-03-23 

R> str(X) 


[, 1 ] 

-1.0155926 
1.9552940 
-0.0903196 
0.2145388 
-0.7385277 
-0.5743887 
-1.3170161 
-0.1829254 
0.4189824 
0.3243043 


An 'xts' object from 1970-01-16 to 1970-03-23 containing: 

Data: num [1:10, 1] -1.0156 1.9553 -0.0903 0.2145 -0.7385 ... 
Indexed by objects of class: [Date] TZ: 
xts Attributes: 

NULL 

and in the above we have used as.Date to transform a vector of indexes in 
times. With as.xts it is possible to convert objects from zoo and ts to xts 
only if the time indexes are transformed first in true time/class objects or if 
an additional argument order.by is specified appropriately. The package xts 
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X 



Figure B.3 Result of the plot method for objects of class xts. 

redefines the plot method for objects of class xts to put in evidence the irregular 
time spaces. The next code produces the graph in Figure B.3. 

R> plot (X) 


B.2.4 The irts class 

Another framework for irregular times series is provided by the package tseries in 
the class irts. The use of irts is similar to zoo but the arguments are reversed: 
the first argument is the vector of times and the second argument is the vector 
of values of the time series 

R> require(tseries) 

R> X <- irts(cumsum(rexp(10, rate = 0.1)), rnorm(lO)) 

R> X 


1970-01-01 

1970-01-01 

1970-01-01 

1970-01-01 

1970-01-01 

1970-01-01 

1970-01-01 

1970-01-01 

1970-01-01 

1970-01-01 


00:00:05 GMT 
00:00:16 GMT 
00:00:28 GMT 
00:00:36 GMT 
00:01:00 GMT 
00:01:11 GMT 
00:01:30 GMT 
00:01:30 GMT 
00:01:30 GMT 
00:01:33 GMT 


-1.388 

-0.2646 

-0.9473 

0.7395 

0.8968 

-0.346 

-1.782 

0.4649 

-1.951 

-0.5161 


R> str(X) 

List of 2 

$ time : POSIXct[1:10], format: "1970-01-01 09:00:05" 
"1970-01-01 09:00:16" . . . 

$ value: num [1:10] -1.388 -0.265 -0.947 0.74 0.897 ... 
- attr(*, "class")= chr "irts" 


As usual there exist functions to convert objects from one class to another. 
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B.2.5 The timeSeries class 

The last class we present is the timeSeries in the package timeSeries of the 
Rmetrics suite. This package depends on the timeDate package which is used 
to store time and dates object in a system independent way. We will discuss the 
relevance of this issue in Section B.3. 


R> require(timeSeries) 

R> X <- timeSeries(rnorm(lO), as.Date(cumsum(rexp(10, 
rate = 0.1)))) 

R> X 


GMT 


1970-01-21 

1970-01-22 

1970-01-26 

1970-01-29 

1970-01-30 

1970-01-30 

1970-03-06 

1970-04-08 

1970-04-09 

1970-04-12 


06:11:34 

15:29:47 

07:37:02 

15:10:12 

05:31:07 

10:14:35 

10:14:41 

10:04:43 

19:31:46 

19:39:18 


TS.l 
0.4890803 
0.9020247 
0.6403630 
0.9512089 
-0.5991232 
-1.3306950 
-0.5922097 
1.8010509 
1.0553386 
-0.2919208 


R> str(X) 


Time Series: 
Name: 

Data Matrix: 
Dimension: 
Column Names: 
Row Names: 
Positions: 

Start: 

End: 

With: 

Format: 
FinCenter: 
Units: 

Title: 

Documentation: 


obj ect 

10 1 
TS.l 

1970-01-21 06:11:34 ... 1970-04-12 19:39:18 

1970-01-21 06:11:34 
1970-04-12 19:39:18 

%Y-%m-%d %H:%M:%S 

GMT 

TS.l 

Time Series Object 
Wed Nov 17 00:18:09 2010 


B.3 Dates and time handling 

Dates and time stamps are peculiar of time series and R supports many formats. 
Most of the time the user downloads data from a service and wants to keep 
or transform the time stamp of the data. Or, vice versa, after data have been 
simulated, one wants to attach the correct time information to the data. We now 
explain the basic concepts and see how different classes of time series handle 
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these formats. We start with the POSIX formats. POSIX (Portable Operating 
System Interface) is the IEEE standard for a common interface in most UNIX-like 
operating systems now available on other platforms. There is a POSIX standard 
for dates but also for many other tasks. Here we focus on dates. Using the 
function isodate it is possible to create a data object very easily as follows: 

R> d <- ISOdate(2006, 6, 9) 

R> d 

[1] "2006-06-09 12:00:00 GMT" 

The function isodate accepts the following arguments as we see using the com¬ 
mand args: 

R> args(ISOdate) 

function (year, month, day, hour = 12, min=0, sec = 0, tz = "GMT") 
NULL 

All arguments are easy to understand. The most important one is tz, the time zone 
argument. By default the time zone is set equal to ‘gmt’ which corresponds to the 
Coordinated Universal Time (UTC), formerly called the Greenwich Mean Time. 
It means, that the time represented by our object d is referred to as Greenwich 
local time. Similarly ‘cet’ is the Central European Time which corresponds to 
UTC+1. This is the time adopted by countries in central Europe like Italy, France, 
Spain, Germany, etc. We can see that this object d is indeed a POSIX time and 
in particular it is of class posixct where ct stands for ‘calendar time’. 

R> class (d) 

[1] "POSIXt" "POSIXct" 

Internally objects of type posixct are stored as the number of seconds since 
1970 in the UTC time zone. A second representation is called posixit which is 
internally stored as a list with the following entries: 

R> names(as.POSIXlt(d)) 

[1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" 

"isdst" 

R> unlist(as.POSIXlt(d)) 

sec min hour mday mon year wday yday isdst 

0 0 12 9 5 106 5 159 0 

It is possible to convert one format into another using as. posixit and as. 
posixct coercing functions. Due to the fact that POSIX dates contain calendar 
information, it is possible to represent them in different formats. Next is an 
example of several representations using the format function without comments: 
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R> format (d, "%a") 

[1] "Fri" 

R> format (d, ”%A") 

[1] "Friday" 

R> format (d, "%b") 

[1] "Jun" 

R> format (d, "%B") 

[ 1 ] "June" 

R> format (d, "%c") 

[1] "Fri 9 Jun 12:00:00 2006" 

R> format (d, "%D") 

[1] "06/09/06" 

R> format (d, "%T") 

[ 1 ] " 12 : 00 : 00 " 

R> format(d, "%A %B %d %H:%M:%S %Y") 

[1] "Friday June 09 12:00:00 2006" 

R> format(d, "%A %d/%m/%Y") 

[1] "Friday 09/06/2006" 

R> format(d, "%d/%m/%Y (%A)") 

[1] "09/06/2006 (Friday)" 

and so forth. For a complete set of conversion operators % one should read the 
man page of the command format. It is also possible to convert strings into 
real date objects with the function strptime. The next example shows a simple 
example of use: 

R> x <- c("ljanl960", "2janl960", "31marl960", ”30jull960") 

R> strptime(x, "%d%b%Y") 

[1] "1960-01-01" "1960-01-02" "1960-03-31" "1960-07-30" 

In this case ‘jan, mar, juT are interpreted correctly as ‘january, march, july’ 
but in different locales, e.g. Italian, ‘jan’ and ‘jul’ will not be recognized by 
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the system and hence strptime returns a na date. The user should check their 
environment carefully before attempting data manipulations like the above. R 
offers functions like Sys.getiocale and Sys.setiocale to inspect and change 
the current ‘locale’ setting. The next example temporarily sets the locale settings 
to Italian and then switches it back to UK English: 2 

R> Sys.getlocale() 

[1] "en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8" 

R> Sys.set1ocale("LC_ALL", "it_it") 

[1] " it_it/it_it/it_it/C/it_it/en_GB.UTF-8" 

R> strptime (x, "%d%b%Y") 

[1] NA NA "1960-03-31" NA 

R> Sys.setlocale("LC_ALL", "en_GB") 

[ 1] "en_GB/en_GB/en_GB/C/en_GB/en_GB.UTF-8" 

R> strptime(x, "%d%b%Y") 

[1] "1960-01-01" "1960-01-02" "1960-03-31" "1960-07-30" 

When data are created without any time specification, by default isodate uses 
12am as reference and as.posixct uses 12pm 

R> format(ISOdate(2006, 6, 9), "%H:%M:%S") 

[1] "12:00:00" 

R> format(as.POSIXct("2006-06-09"), ”%H:%M:%S") 

[1] "00:00:00" 


B.3.1 Dates manipulation 

It is also possible to extract information which is of interest in empirical finance. 
For example, the package timeDate implements functions like holiday* which 
describe the dates of holidays of several financial centers: 

R> holidayNYSE() 

NewYork 

[1] [2010-01-01] [2010-01-18] [2010-02-15] [2010-04-02] 

[2010-05-31] 


2 Note that you have to check exact naming of the locales on your system. In our example it is 
OS X. 
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[6] [2010-07-05] [2010-09-06] [2010-11-25] [2010-12-24] 

R> holidayNERC() 

Eastern 

[1] [2009-12-31 19:00:00] [2010-05-30 20:00:00] 

[2010-07-04 20:00:00] 

[4] [2010-09-05 20:00:00] [2010-11-24 19:00:00] 

It is possible to make calculations with times such as the following: 

R> ISOdate(2006, 6, 9) - ISOdate(2006, 3, 1) 

Time difference of 100 days 
Or, using the timeDate class 

R> my.dates = timeDate(c("2001-01-05", "2001-02-15")) 

R> my.dates[2] - my.dates[1] 

Time difference of 41 days 


Time zones are also important when one wants to synchronize data coming from 
different financial sources. We will see that, if data possesses correct time stamps, 
this is quite easy. The package timeDate contains the function listFinCenter 
which lists the names of known financial centers. This is a growing list, but we 
can get an idea of what this command produces for European countries: 

R> listFinCenter("Europe*") 


[1] 

"Europe/Amsterdam" 

"Europe/Andorra" 

"Europe/Athens" 

[4] 

"Europe/Belgrade" 

"Europe/Berlin" 

"Europe/Bratislava 

[7] 

"Europe/Brussels" 

"Europe/Buchare s t" 

"Europe/Budapest" 

[10] 

" Europe/Chisinau 11 

"Europe/Copenhagen" 

"Europe/Dublin" 

[13] 

"Europe/Gibraltar" 

"Europe/Guernsey" 

"Europe/Helsinki" 

[16] 

"Europe/Isle_of_Man" 

"Europe/Istanbul" 

"Europe/Jersey" 

[19] 

"Europe/Kaliningrad" 

"Europe/Kiev" 

"Europe/Lisbon" 

[22] 

"Europe/Lj ublj ana" 

"Europe/London" 

"Europe/Luxembourg 

[25] 

"Europe/Madrid" 

"Europe/Malta" 

"Europe/Mariehamn" 

[28] 

"Europe/Minsk" 

"Europe/Monaco" 

"Europe/Moscow" 

[31] 

"Europe/Oslo" 

"Europe/Paris" 

"Europe/Podgorica" 

[34] 

"Europe/Prague" 

"Europe/Riga" 

"Europe/Rome" 

[37] 

"Europe/Samara" 

"Europe/San_Marino" 

"Europe/Sarajevo" 

[40] 

"Europe/Simferopol" 

"Europe/Skopje" 

"Europe/Sofia" 

[43] 

"Europe/Stockholm" 

"Europe/Tallinn" 

"Europe/Tirane" 

[46] 

"Europe/Uzhgorod" 

"Europe/Vaduz" 

"Europe/Vatican" 

[49] 

"Europe/Vienna" 

"Europe/Vilnius" 

"Europe/Volgograd" 

[52] 

"Europe/Warsaw" 

"Europe/Zagreb" 

"Europe/Zaporozhye 

[55] 

"Europe/Zurich" 




The information about the financial center can be passed when the date objects 
are created to fix the correct time zone and geographical information, e.g. 
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R> dl <- timeDate("2001-01-05", Fin 
R> d2 <- timeDate("2001-01-05", Fin 
R> dl 


"Europe/Paris") 
"America/New_York") 


Europe/Paris 

[1] [2001-01-05 01:00:00] 

R> d2 

America/New_York 
[1] [2001-01-04 19:00:00] 

For further informations about date/time manipulation a suggested reading is the 
time/date FAQ Wtirtz el id. (2010) ebook. 


B.3.2 Using date objects to index time series 

We now see how to assign time and dates as index to time series. We consider 
only zoo, xts and timeDate objects. We first create some random data and create 
some string dates: 

R> set.seed(123) 

R> data <- rnorm(6) 

R> charvec <- paste ("2009-0", 1:6, "-01", sep = "") 

R> charvec 

[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" 
"2009-05-01" 

[6] "2009-06-01" 

then generate the corresponding objects with the different classes: 

R> X <- zoo(data, as.Date(charvec)) 

R> Y <- xts(data, as.Date(charvec)) 

R> Z <- timeSeries(data, charvec) 

and see how the look 

R> X 

2009-01-01 2009-02-01 2009-03-01 2009-04-01 2009-05-01 
-0.56047565 -0.23017749 1.55870831 0.07050839 0.12928774 
2009-06-01 
1.71506499 

R> Y 

[, 1 ] 

2009-01-01 -0.56047565 
2009-02-01 -0.23017749 
2009-03-01 1.55870831 
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2009-04-01 

2009-05-01 

2009-06-01 

R> Z 

GMT 

2009-01-01 

2009-02-01 

2009-03-01 

2009-04-01 

2009-05-01 

2009-06-01 


0.07050839 

0.12928774 

1.71506499 


TS.l 

-0.56047565 

-0.23017749 

1.55870831 

0.07050839 

0.12928774 

1.71506499 


Similarly, we should have used one of the following approaches: 


R> zl <- zoo(data, as.POSIXct (charvec)) 

R> z2 <- zoo(data, ISOdatetime(2009, 1:6, 1, 0, 0, 0)) 
R> z3 <- zoo(data, ISOdate(2009, 1:6, 1, 0)) 

R> zl 


2009-01-01 
-0.56047565 
2009-06-01 
1.71506499 
R> z2 

2009-01-01 

-0.56047565 

2009-06-01 

1.71506499 

R> z3 

2009-01-01 

-0.56047565 

2009-06-01 

1.71506499 


2009-02-01 

-0.23017749 


2009-02-01 

-0.23017749 


2009-02-01 

-0.23017749 


2009-03-01 

1.55870831 


2009-03-01 

1.55870831 


2009-03-01 

1.55870831 


2009-04-01 

0.07050839 


2009-04-01 

0.07050839 


2009-04-01 

0.07050839 


2009-05-01 

0.12928774 


2009-05-01 

0.12928774 


2009-05-01 

0.12928774 


B.4 Binding of time series 

Suppose we have two parts of the same time series collected in different periods 
of time. It is possible to merge them by row, i.e. by date, using the rbind function. 
If the time indexes do not overlap, all classes performs in the same way: 

R> set.seed(123) 

R> dl <- rnorm(5) 

R> d2 <- rnorm(7) 

R> datel <- ISOdate(2009, 1:5, 1) 

R> date2 <- ISOdate(2009, 6:12, 1) 
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R> zl <- zoo(dl, datel) 

R> z2 <- zoo(d2, date2) 

R> rbind(zl, z2) 

2009-01-01 21:00:00 2009-02 
-0.56047565 
2009-04-01 21:00:00 
0.07050839 

2009-05-01 21:00:00 2009-06 
0.12928774 
2009-08-01 21:00:00 
-1.26506123 

2009-09-01 21:00:00 2009-10 
-0.68685285 
2009-12-01 21:00:00 
0.35981383 


01 21:00:00 2009-03-01 21:00:00 
-0.23017749 1.55870831 


01 21:00:00 2009-07-01 21:00:00 
1.71506499 0.46091621 


01 21:00:00 2009-11-01 21:00:00 
-0.44566197 1.22408180 


R> x 1 <- xts(dl, 
R> x2 <- xts(d2, 
R> rbind(xl, x2) 


2009-01-01 

2009-02-01 

2009-03-01 

2009-04-01 

2009-05-01 

2009-06-01 

2009-07-01 

2009-08-01 

2009-09-01 

2009-10-01 

2009-11-01 

2009-12-01 


datel) 

date2) 


1 , 1 ] 

00 -0.56047565 
00 -0.23017749 
00 1.55870831 
00 0.07050839 
00 0.12928774 
00 1.71506499 
00 0.46091621 
00 -1.26506123 
00 -0.68685285 
00 -0.44566197 
00 1.22408180 
00 0.35981383 


21 : 00 : 

21 : 00 : 

21 : 00 : 

21 : 00 : 

21 : 00 : 

21 : 00 : 

21 : 00 : 

21 : 00 : 

21 : 00 : 

21 : 00 : 

21 : 00 : 

21 : 00 : 


R> sl <- timeSeries(dl, datel) 
R> s2 <- timeSeries(d2, date2) 
R> rbind(sl, s2) 


GMT 

2009-01-01 

2009-02-01 

2009-03-01 

2009-04-01 

2009-05-01 

2009-06-01 

2009-07-01 

2009-08-01 

2009-09-01 

2009-10-01 

2009-11-01 

2009-12-01 


12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 


TS.1_TS.1 
-0.56047565 
-0.23017749 
1.55870831 
0.07050839 
0.12928774 
1.71506499 
0.46091621 
-1.26506123 
-0.68685285 
-0.44566197 
1.22408180 
0.35981383 
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but when indexes do overlap, some classes fail. In particular, zoo requires 
unique indexes. Thus, for example, if we create overlapping dates with this: 

R> date2 <- ISOdate(2009, 4:10, 1) 

R> z2 <- zoo(d2, date2) 

the following code produces an error: 


R> rbind(zl, z2) 


Error in rbind(deparse.level,...) : indexes overlap 


while the xts and timeSeries simply duplicate the entries: 


R> x2 <- xts(d2, date2) 
R> rbind(xl, x2) 


2009-01-01 

2009-02-01 

2009-03-01 

2009-04-01 

2009-04-01 

2009-05-01 

2009-05-01 

2009-06-01 

2009-07-01 

2009-08-01 

2009-09-01 

2009-10-01 


21 : 00:00 

21 : 00:00 

21 : 00:00 

21 : 00:00 

21 : 00:00 

21 : 00:00 

21 : 00:00 

21 : 00:00 

21 : 00:00 

21 : 00:00 

21 : 00:00 

21 : 00:00 


[, 1 ] 

-0.56047565 
-0.23017749 
1.55870831 
0.07050839 
1.71506499 
0.12928774 
0.46091621 
-1.26506123 
-0.68685285 
-0.44566197 
1.22408180 
0.35981383 


R> s2 <- timeSeries(d2, date2) 
R> rbind(si, s2) 

GMT 


2009-01-01 

2009-02-01 

2009-03-01 

2009-04-01 

2009-05-01 

2009-04-01 

2009-05-01 

2009-06-01 

2009-07-01 

2009-08-01 

2009-09-01 

2009-10-01 


12 : 00:00 

12 : 00:00 

12 : 00:00 

12 : 00:00 

12 : 00:00 

12 : 00:00 

12 : 00:00 

12 : 00:00 

12 : 00:00 

12 : 00:00 

12 : 00:00 

12 : 00:00 


TS.1_TS.1 
-0.56047565 
-0.23017749 
1.55870831 
0.07050839 
0.12928774 
1.71506499 
0.46091621 
-1.26506123 
-0.68685285 
-0.44566197 
1.22408180 
0.35981383 


A different approach is called merging which is analogous to the same 
functionalities for standard data.frame in the R system. Also in this case, the 
class zoo and xts perform similarly and timeSeries differentiates. Indeed, if 
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we use the merge function both 


R> merge(zl, z2) 




zl 

z2 

2009-01-01 

21:00:00 

-0.56047565 

NA 

2009-02-01 

21:00:00 

-0.23017749 

NA 

2009-03-01 

21:00:00 

1.55870831 

NA 

2009-04-01 

21:00:00 

0.07050839 

1.7150650 

2009-05-01 

21:00:00 

0.12928774 

0.4609162 

2009-06-01 

21:00:00 

NA 

-1.2650612 

2009-07-01 

21:00:00 

NA 

-0.6868529 

2009-08-01 

21:00:00 

NA 

-0.4456620 

2009-09-01 

21:00:00 

NA 

1.2240818 

2009-10-01 

21:00:00 

NA 

0.3598138 


R> merge(xl, x2) 




xl 

x2 

2009-01-01 

21:00:00 

-0.56047565 

NA 

2009-02-01 

21:00:00 

-0.23017749 

NA 

2009-03-01 

21:00:00 

1.55870831 

NA 

2009-04-01 

21:00:00 

0.07050839 

1.7150650 

2009-05-01 

21:00:00 

0.12928774 

0.4609162 

2009-06-01 

21:00:00 

NA 

-1.2650612 

2009-07-01 

21:00:00 

NA 

-0.6868529 

2009-08-01 

21:00:00 

NA 

-0.4456620 

2009-09-01 

21:00:00 

NA 

1.2240818 

2009-10-01 

21:00:00 

NA 

0.3598138 


produce a two-dimensional time series aligning the time indexes and setting to 
na the missing observations in each time series. On the contrary, the timeSeries 
class produces one single one-dimensional time series keeping duplicates as 
in rbind: 


R> merge(sl, s2) 
GMT 


2009-01-01 

2009-02-01 

2009-03-01 

2009-04-01 

2009-04-01 

2009-05-01 

2009-05-01 

2009-06-01 

2009-07-01 

2009-08-01 

2009-09-01 

2009-10-01 


12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 
12 : 00:00 


TS. 1 
-0.56047565 
-0.23017749 
1.55870831 
0.07050839 
1.71506499 
0.12928774 
0.46091621 
-1.26506123 
-0.68685285 
-0.44566197 
1.22408180 
0.35981383 
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To obtain a behaviour similar to zoo and xts, one should specify different time 
units. For example: 


R> s2 <- timeSeries <d2, date2, units = "s2") 
R> merge(si, s2) 




TS.l 

s2 

2009-01-01 

12:00:00 

-0.56047565 

NA 

2009-02-01 

12:00:00 

-0.23017749 

NA 

2009-03-01 

12:00:00 

1.55870831 

NA 

2009-04-01 

12:00:00 

0.07050839 

1.7150650 

2009-05-01 

12:00:00 

0.12928774 

0.4609162 

2009-06-01 

12:00:00 

NA 

-1.2650612 

2009-07-01 

12:00:00 

NA 

-0.6868529 

2009-08-01 

12:00:00 

NA 

-0.4456620 

2009-09-01 

12:00:00 

NA 

1.2240818 

2009-10-01 

12:00:00 

NA 

0.3598138 


A final remark is that, while for zoo and xts the arguments in rbind are treated 
as symmetric, in timeSeries they are not. Flence, for example: 


R> datel <- ISOdate(2009, 1:5, 1) 
R> date2 <- ISOdate(2009, 6:12, 1) 
R> si <- timeSeries(dl, datel) 

R> s2 <- timeSeries(d2, date2) 

R> rbind(si, s2) 


GMT 

2009-01-01 12:00: 
2009-02-01 12:00: 
2009-03-01 12:00: 
2009-04-01 12:00: 
2009-05-01 12:00: 
2009-06-01 12:00: 
2009-07-01 12:00: 
2009-08-01 12:00: 
2009-09-01 12:00: 
2009-10-01 12:00: 
2009-11-01 12:00: 
2009-12-01 12:00: 

R> rbind(s2, si) 

GMT 

2009-06-01 12:00: 
2009-07-01 12:00: 
2009-08-01 12:00: 
2009-09-01 12:00: 
2009-10-01 12:00: 
2009-11-01 12:00: 


TS.1_TS.1 
00 -0.56047565 
00 -0.23017749 
00 1.55870831 
00 0.07050839 
00 0.12928774 
00 1.71506499 
00 0.46091621 
00 -1.26506123 
00 -0.68685285 
00 -0.44566197 
00 1.22408180 
00 0.35981383 


TS.1_TS.1 
00 1.71506499 
00 0.46091621 
00 -1.26506123 
00 -0.68685285 
00 -0.44566197 
00 1.22408180 
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2009-12-01 12:00:00 
2009-01-01 12:00:00 
2009-02-01 12:00:00 
2009-03-01 12:00:00 
2009-04-01 12:00:00 
2009-05-01 12:00:00 


0.35981383 
-0.56047565 
-0.23017749 
1.55870831 
0.07050839 
0.12928774 


produce different ordering but one can still use, in all classes, the two functions 
sort and rev: 


R> sort(rbind(s2, si)) 


GMT 

2009-01-01 

2009-02-01 

2009-03-01 

2009-04-01 

2009-05-01 

2009-06-01 

2009-07-01 

2009-08-01 

2009-09-01 

2009-10-01 

2009-11-01 

2009-12-01 


12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 


TS.1_TS.1 
-0.56047565 
-0.23017749 
1.55870831 
0.07050839 
0.12928774 
1.71506499 
0.46091621 
-1.26506123 
-0.68685285 
-0.44566197 
1.22408180 
0.35981383 


R> sort(rbind(s2, si), deer = TRUE) 


GMT 


2009-12-01 

2009-11-01 

2009-10-01 

2009-09-01 

2009-08-01 

2009-07-01 

2009-06-01 

2009-05-01 

2009-04-01 

2009-03-01 

2009-02-01 

2009-01-01 


12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 


TS.1_TS.1 
0.35981383 
1.22408180 
-0.44566197 
-0.68685285 
-1.26506123 
0.46091621 
1.71506499 
0.12928774 
0.07050839 
1.55870831 
-0.23017749 
-0.56047565 


to sort the dates in increasing or decreasing order or to revert the time stamps 
of a time series when they are downloaded from external resources: 


R> s2 


GMT 


2009-06-01 12:00:00 
2009-07-01 12:00:00 


TS. 1 
1.7150650 
0.4609162 
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2009-08-01 

2009-09-01 

2009-10-01 

2009-11-01 

2009-12-01 

R> rev(s2) 

GMT 

2009-12-01 

2009-11-01 

2009-10-01 

2009-09-01 

2009-08-01 

2009-07-01 

2009-06-01 


12 

: 00 : 

00 

12 

: 0 0 : 

00 

12 

: 0 0 : 

00 

12 

: 0 0 : 

00 

12 

: 0 0 : 

00 


12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 0 0 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 

12 

: 00 : 

00 


-1.2650612 
-0.6868529 
-0.4456620 
1.2240818 
0.3598138 


TS.l 
0.3598138 
1.2240818 
-0.4456620 
-0.6868529 
-1.2650612 
0.4609162 
1.7150650 


B.4.1 Subsetting of time series 

Subsetting of time series is similar to indexing of matrix objects. For simplicity 
we make use of the data set quotes from the sde package which is stored in the 
zoo format. 


R> require(sde) 

To check the errata corrige of the book, type vignette 
("sde.errata") 

R> data(quotes) 

R> str(quotes) 

'zoo' series from 2006-01-03 to 2007-12-31 

Data: num [1:520, 1:20] 26.8 27 27 26.9 26.9 ... 

- attr(*, "dimnames")=List of 2 
..$ : chr [1:520] "2006-01-03" "2006-01-04" "2006-01-05" 
"2006-01-06" ... 

..$ : Chr [1:20] "MSOFT" "AMD" "DELL" "INTEL" ... 

Index: Class 'Date' num [1:520] 13151 13152 13153 13154 
13157 ... 

We can see that the Data slot consists of a matrix with attributes for coinames 
and rownames, respectively the time series names and time stamps. We can access 
this object as follows: 

R> quotes[1, 1:5] 


MSOFT AMD DELL INTEL HP 
2006-01-03 26.84 32.4 30.61 25.57 28.77 


R> quotes[l:10, "MSOFT"] 
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2006-01-03 2006-01-04 2006-01-05 2006-01-06 2006-01-09 2006-01-10 

26.84 26.97 26.99 26.91 26.86 27.00 

2006-01-11 2006-01-12 2006-01-13 2006-01-16 

27.29 27.14 27.19 27.04 

but we can also use the data, frame-like access with $ 

R> quotes$MSOFT[l:10] 

2006-01-03 2006-01-04 2006-01-05 2006-01-06 2006-01-09 2006-01-10 

26.84 26.97 26.99 26.91 26.86 27.00 

2006-01-11 2006-01-12 2006-01-13 2006-01-16 

27.29 27.14 27.19 27.04 

but we can also access data by dates. For example, 

R> date <- as.Date(sprintf("2006-07-%.2d", 1:10)) 

R> date 



[1] 

"2006- 

-07- 

-01" 

" 2i 

006- 

-07 

-02" 

"2006- 

07- 

03" 

"2006- 

-07- 

-04 



"2006- 

-07- 

-05" 













[6] 

"2006- 

-07- 

-06" 

"2i 

006- 

-07 

-07" 

"2006- 

07- 

08" 

"2006- 

-07- 

-09 



"2006- 

-07- 

-10" 












R> quotes[date, 1. 

:5] 














MSOFT 

AMD 

DELL 

INTEL 


HP 




2 

006- 

07-03 

23 

.700 

24 

.60 

24 

.590 

19 . 

.360 

32 

.51 




2 

006- 

■07-04 

23 

.525 

24 

.25 

24 

.405 

19 . 

.055 

32 

. 64 




2 

006- 

07-05 

23 

.350 

23 

.90 

24 

.220 

18 . 

.750 

32 

.77 




2 

006- 

07-06 

23 

.480 

23 

.83 

24 

.150 

18 . 

.850 

33 

. 10 




2 

006- 

07-07 

23 

.300 

23 

.56 

23 

.870 

18 . 

.560 

32 

. 85 




2 

006- 

07-10 

23 

.500 

22 

.51 

23 

.480 

18 . 

.180 

31 

.93 





and we see that, in case of missing observations for some dates, these dates are 
ignored. Of course, because dates are objects, we can do selection on dates like 
this: 

R> start <- as.Date("2006-06-25") 

R> end <- as.Date("2006-07-10") 

R> quotes[(time(quotes) >= start) & (time(quotes) <= end), 1:5] 






MSOFT 

AMD 

DELL 

INTEL 


HP 

2 

006- 

-06- 

-26 

22 . 

. 820 

24 

.66 

23 

.840 

18 . 

.280 

32 . 

.49 

2 

006- 

-06- 

-27 

22 . 

. 860 

24 

.26 

23 

.710 

18 . 

.050 

31. 

.94 

2 

006- 

-06- 

-28 

23 . 

. 160 

23 

.89 

23 

.850 

18 . 

. 660 

31. 

.59 

2 

006- 

-06- 

-29 

23 . 

.470 

24 

.81 

24 

.620 

19 . 

.320 

32 . 

. 03 

2 

006- 

-06- 

-30 

23 . 

.300 

24 

.42 

24 

.460 

19 . 

.000 

31. 

. 68 

2 

006- 

-07- 

-03 

23 . 

.700 

24 

.60 

24 

.590 

19 . 

.360 

32 . 

.51 

2 

006- 

-07- 

-04 

23 . 

. 525 

24 

.25 

24 

.405 

19 . 

.055 

32 . 

. 64 

2 

006- 

-07- 

-05 

23 . 

.350 

23 

.90 

24 

.220 

18 . 

.750 

32 . 

.77 

2 

006- 

-07- 

-06 

23 . 

.480 

23 

.83 

24 

.150 

18 . 

.850 

33 . 

. 10 

2 

006- 

-07- 

-07 

23 . 

.300 

23 

.56 

23 

.870 

18 . 

.560 

32 . 

. 85 

2 

006- 

-07- 

-10 

23 . 

. 500 

22 

.51 

23 

.480 

18 . 

.180 

31. 

.93 
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B.5 Loading data from financial data servers 

There are several ways to obtain data via http queries to famous financial data 
providers, local or remote data bases but also commercial services. For example, 
the package quantmod has a single function to get data from Yahoo! Finance 3 , 
Google Finance 4 , FRED 5 - Federal Reserve Bank of St. Louis or OANDA 6 but 
also from local MySQL data bases, csv hies, or R data. We have already made 
use of the function getSymbois for downloading data from Yahoo! Finance with 

R> getSymbois("AAPL") 

[1] "AAPL" 

R> attr(AAPL, "src") 

[1] "yahoo" 

but if we want to get the data from Google Finance we can specify the argument 
src as follows: 

R> getSymbois("AAPL", src = "google") 

[1] "AAPL" 

R> attr(AAPL, "src") 

[1] "google" 

For exchange rates and currencies we can use both FRED or OANDA as follows: 

R> getSymbois("DEXUSEU", src = "FRED") 

[1] "DEXUSEU" 

R> attr(DEXUSEU, "src") 

[1] "FRED" 

R> getSymbois("EUR/USD", src = "oanda") 

[1] "EURUSD" 

R> attr(EURUSD, "src") 

[ 1] "oanda" 

R> str(EURUSD) 


3 http://finance.yahoo.com 

4 http://finance.google.com 

5 http://research.stlouisfed.org/fred2 

6 http://www.oanda.com 
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An 'xts' object from 2009-07-06 to 2010-11-16 containing: 

Data: num [1:499, 1] 1.4 1.39 1.4 1.39 1.4 ... 

- attr(*, "dimnames")=List of 2 
..$ : NULL 

..$ : chr "EUR.USD" 

- attr(*, "src")= chr "oanda" 

- attr(*, "updated")= POSIXct[1:1] , format: "2010-11-17 00:18:14 
Indexed by objects of class: [Date] TZ: 

xts Attributes: 

List of 2 

$ src : chr "oanda" 

$ updated: POSIXct[1:1], format: "2010-11-17 00:18:14" 


Notice that getSymbois returns an object of class xts in the R workspace with 
the same name of the symbol. 

Another option is the flmport package which offers similar functionalities 
but returns objects of class timeSeries or fWEBDATA. For example, to get data 
from Yahoo! Finance, one can either use yahooSeries or yahooimport, e.g. 


R> require<flmport) 

R> X <- yahooSeries("AAPL") 
R> str(X) 


Time Series: 
Name: 

Data Matrix: 
Dimension: 
Column Names: 

Row Names: 
Positions: 

Start: 

End: 

With: 

Format: 
FinCenter: 
Units: 

Title: 

Documentation: 


object 
252 6 

AAPL.Open AAPL.High AAPL.Low AAPL.Close 
AAPL.Volume AAPL.Adj.Close 
2010-11-15 ... 2009-11-16 

2009- 11-16 

2010- 11-15 

%Y-%m-%d 

GMT 

AAPL.Open AAPL.High AAPL.Low AAPL.Close 
AAPL.Volume AAPL.Adj.Close 
Time Series Object 
Wed Nov 17 00:18:19 2010 


but 

R> X <- yahooimport("AAPL") 

R> str(X) 

Formal class 'fWEBDATA' [package "flmport"] with 5 slots 
. .0 call : language yahooimport (query = query, 

file = file, source = source, 
frequency = frequency, from = from, to = to, 
nDaysBack = nDaysBack, save = save, 

:Time Series: 


..0 data 
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Name: 

Data Matrix: 
Dimension: 
Column Names: 
Row Names: 
Positions: 
Start: 

End: 


obj ect 
6607 6 

Open High Low Close Volume Adj.Close 
2010-11-15 ... 1984-09-07 

1984-09-07 

2010-11-15 


With: 

Format: 

FinCenter: 

Units: 

Title: 

Documentation: 

..@ param : 

.. ..- attr(*, " 

..@ title : 

description: 


%Y-%m-%d 

GMT 

Open High Low Close Volume Adj.Close 
Time Series Object 
Wed Nov 17 00:18:22 2010 
Named chr [1:2] "AAPL" "daily" 
names")= chr [1:2] "Instrument" "Frequency 
chr "Data Import from www.yahoo.com" 
chr "Wed Nov 17 00:18:22 2010 by user: " 


and the class fWEBDATA stores the times series in a timeSeries object within 
the rich structure which describes more efficiently the source of the data. Similar 
functionalities exist for OANDA (fredSeries, fredimport) and FRED (oanda- 
Series, oandaimport). The fOptions package will soon be enhanced to allow 
for downloading data from additional resources not included in other packages. 

Another option is the use of the function get.hist.quote from package 
tseries which downloads data either from Yahoo! Finance or OANDA and returns 
a zoo object. The use is as simple as follows: 


R> x <- get.hist.quote("AAPL") 
time series ends 2010-11-15 
R> str(x) 

'zoo' series from 1991-01-02 to 2010-11-15 

Data: num [1:5010, 1:4] 42.8 43.5 43 43 43.8 ... 

- attr(*, "dimnames")=List of 2 
..$ : NULL 

..$ : chr [1:4] "Open" "High" "Low" "Close" 

Index: Class 'Date' num [1:5010] 7671 7672 7673 7676 7677 ... 

R> x <- get .hist. quote (instrument = "EUR/USD", provider = "oanda ", 

+ start = Sys.DateO - 300) 

R> str(x) 

'zoo' series from 2010-01-21 to 2010-11-16 

Data: num [1:300] 1.42 1.41 1.41 1.41 1.41 ... 

Index: Class 'Date' num [1:300] 14630 14631 14632 14633 
14634 ... 

We mention finally that the RBloomberg allows for data fetching from 
Bloomberg. RBloomberg only works on a Bloomberg workstation, using the 
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Desktop COM API. The user also needs to install the RDCOMClient package 
or the rcom package for interprocess communications between R and the 
Bloomberg workstation. 

B.6 Bibliographical notes 

There is a growing interest of R in Finance, but still few specific publications 
are available at present. We already mentioned Wiirtz et al. (2009) and Wiirtz 
et al. (2010) from the Rmetrics group. Other books like Ruppert (2006), Franke 
et al. (2004) and Boland (2007) contain either code or have specialized support 
web sites with R examples from the books. General time series analysis books 
with applications to finance are Tsay (2005) and Carmona (2004). 


References 

Boland, P. J. (2007). Statistical and Probabilistic Methods in Actuarial Science. 
Chapman & Hall/CRC, Boca Raton, FL. 

Carmona, R. (2004). Statistical Analysis of Financial Data in S-Plus. Springer, New York. 
Franke, J., Hardle, W., and Hafner, C. (2004). Statistics of Financial Markets: An Intro¬ 
duction. Springer, New York, New York. 

Ruppert, D. (2006). Statistics and Finance: An Introduction. Springer, New York. 

Tsay, R. S. (2005). Analysis of Financial Time Series. Second Edition. John Wiley & Sons, 
Inc., Hoboken, NJ. 

Wiirtz, D., Chalabi, Y., Chen, W., and Ellis, A. (2009). Portfolio Optimization with 
RJRmetrics. Finance Online Publishing, Zurich. 

Wiirtz, D., Chalabi, Y., Chen, W., and Ellis, A. (2010). A Discussion of Time Series 
Objects for R in Finance. Finance Online Publishing, Zurich. 


Index 


L p integrability, 23 
r, 27 
X 2 , 28 
S, 249 

5-hedging, 249 
5-method, 70 
y greek, 259 

^-stable convergence, 103 
p greek, 259 
cr-algebra, 13 
9 Greek, 258 
^-mixing, 102 

:, 396 
?, 394 

absorbing state, 97 
adapted, 81 
addBBands, 422 
additive, 194 
AEAsian, 270 
American option, 7, 285 
AmericanPutExp,291 
AmericanPutlmp, 296 
apply, 409 
arbitrage free, 223 
arbitrage opportunity, 229 
args, 429 
as, 405 

as.data.frame, 405 
as. Date, 426 
as.integer, 405 
as. ts, 425 
as. xts, 426 
as. zoo, 425 


asset price, 191 
assign, 397 

asymmetric double exponential, 

210 

asymptotically unbiased estimator, 
58 

asynchronous covariance estimator, 
363 

attributes, 405 


backtest, 419 
basket option, 8, 278 
BAWAmericanApproxOption, 
299 

Bayes’ rule, 16 
benchmark, 341 
Bermudan option, 286 
Bernoulli 
sample, 79 
Bessel function, 32 
besselK, 180 

best linear unbiased estimator, 59 
Beta, 30 
bias, 58 

bilateral gamma, 179 
binary, 252 
binomial, 25 
Borel, 51 
Brown, 104 

Brownian motion, 9, 104 
multidimensional, 145, 278 
translated, 232 
two-sided, 351 


Option Pricing and Estimation of Financial Models with R, First Edition. Stefano M. Iacus. 

© 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd. ISBN: 978-0-470-74584-7 



448 


INDEX 


BSAmericanApproxOption, 

300 

Burkholder-Davis-Gundy 

inequality, 89, 127 

c, 396 

cad-lag process, 108 
call option, 5 
Cantelli, 51 
cat, 393 
Cauchy, 30 

Cauchy functional equation, 135 
Cauchy-Schwarz-Bunyakovsky 
inequality, 47 

cce, 364 
CGMY 
process, 209 
change point, 350 
Chapman-Kolomogorov equation, 
95, 98 

character, 397 
characteristic 
exponent, 38, 40 
function, 23 

de Finetti, 132 
triplet, 38 

charSeries, 421 
chartSeries, 274 
Chebyshev inequality, 46 
Chebyshev-Cantelli inequality, 46 
Chebyshev-Markov inequality, 46 
clusterSetupRNG, 412 
coefficient 
diffusion, 128 
drift, 128 

compensated Poisson random 
measure, 143 
compensator, 87, 113 
complementary, 13 
complete additivity, 14 
completeness, 223 
complex conjugate, 23 
complex, 398 

compound Poisson process, 110 
conditional 


expectation, 55 
probability, 15 
confint, 70 
consistent estimator, 60 
contingent claim, 221 
continuous 

mapping theorem, 50 
random variable, 19 
time 

Markov chain, 99 
convergence 
almost sure, 49 
in r-th mean, 49 
in distribution, 48 
in probability, 48 
mean square, 49 
weak, 48 
convolution, 35 
COS, 329 

counting process, 108 
covariance, 22 

asynchronous estimator, 363 
function, 84 
CPoint, 359 
cpoint, 275, 350, 352 
CRAN, 395 

CreditMetrics, 419 
cumulative distribution function, 19 

data.frame, 401 
dbeta, 31 
dbinom, 25 
dcauchy, 30 
dcCIR, 201 
dchisq, 29 

de Finetti characteristic function, 
132 

De Morgan’s laws, 15 
delta Greek, 249 
deltat, 423 
density 

function, 19 
method, 252 
transition, 107 
dependence, 79 


INDEX 


449 


dexp,27 
dgamma, 28 
dgh, 33 
diffusion 

coefficient, 128 
equation, 108 
process, 128 
digital option, 252 
Dirac delta, 38, 117, 331 
discrete 

Fourier transform, 43 
random variable, 19 
dist, 378 

distribution function, 19 
dlnorm, 31 
dmvnorm, 35 
dnig, 32 
dnorm, 28 
Doob 

maximal L 2 inequality, 88 
maximal L p inequality, 88 
maximal inequality, 88 
Doob-Meyer decomposition, 87 
dopar, 243, 413 
do SNOW, 244, 414 
doubling strategies, 229 
dpois, 25 
drift, 8, 192 
coefficient, 128 
dsCIR, 201 
dstable, 41 
dt, 29 
dtw, 378 
dunif, 26 

early exercise premium, 297 
efficiency, 63 
end, 423 
Epps effect, 363 
equivalent measure, 17, 261 
Esscher transform, 315 
estimating function, 66 
estimator 

consistence, 60 
EUCdist, 378 


Euler’s formula, 73 
European option, 7, 222 
events, 14 
elementary, 14 
exercise price, 3 
existence of solutions, 129 
exotic option, 7 
expected value, 21 
expiry date, 3 
exponential 
law, 26 
martingale, 86 
extension space, 103 

fair value, 4 

family of experiments, 57 
fAsianOptions, 272, 420 
fast Fourier transform, 43 
fBasics, 205, 420 
fExoticOptions, 272, 420 
filtration, 81 
natural, 81 
f Import, 273 
finite dimensional 
distribution, 80 
first passage time, 89 
first-order variation, 83 
Fisher information, 62 
f Opt ions, 240, 260, 299, 420 
for, 410 

foreach, 243, 413 
format, 429 
forward, 2 
fPortfolio, 419 
fraction 

optimal, 343 
fractions, 342 
FRED, 442 
fredlmport, 444 
fredSeries, 444 
frequency, 423 
fTrading, 419 
function, 408 
future, 2 
fWEBDATA, 443 


450 


INDEX 


Galton, 31 
gamma, 179, 259 
GBSCharacteristics, 260 
GBSOption, 240, 321 
GBSVolatility, 275 
generalized 
hyperbolic 

process, 207 

tempered stable process, 209 
generalized hyperbolic 
distribution, 32 
generating function 
moment, 24 
geometric 

Brownian motion, 8, 191, 210 
translated, 232 
telegraph process, 211 
get . zoo. data, 178 
getDoParWorkers, 244, 414 
getSymbols, 421, 442 
GH, 32 
process, 207 
ghFit, 205 
Girsanov, 130, 141 
global Lipschitz condition, 129 
Google, 442 
grad, 165 
Greek 
delta, 249 
gamma, 259 
kappa, 259 
rho, 259 
theta, 258 
vega, 259 
growth 

optimal portfolio, 341 
rate, 342 

Holder inequality, 47 
Hajek-Renyi 
inequality, 88 

Hayashi-Yoshida estimator, 363 
head, 404 

heat equation, 108, 233 


hedge ratio, 251 
hedging, 6, 223 
help,394 
help.search, 394 
hessian, 165 
Hessian matrix, 75 
holder, 3, 223 
homogeneous 

Poisson process, 110 
hybrid diffusion system, 147 
hypFit, 205 

i.i.d., 24, 25, 57 
IBrokers, 419 
implied volatility, 275 
independent, 16, 20 
index, 425 
index of stability, 40 
inequality 

Burkholder-Davis-Gundy, 89, 
127 

Cauchy-Schwarz-Bunyakovsky, 

47 

Chebyshev, 46 
Chebyshev-Cantelli, 46 
Chebyshev-Markov, 46 
Doob 

maximal, 88 
maximal L 2 , 88 
maximal L p , 88 
Holder, 47 
Hajek-Renyi, 88 
Jensen, 47, 57 
Kolmogorov, 47, 89 
infinite activity, 138 
infinitely divisible, 37 
infinitesimal generator, 100 
information, 19 
install.packages, 395 
instantaneous state, 101 
integer, 397 

integrable random variables, 21 
intensity function, 110 
invariant distribution, 97 


INDEX 


451 


inverse image, 18 
inversion theorem, 43 
irts,427 
ISOdate, 429 

Ito 

formula, 124, 126, 146 
integral, 9, 119 
process, 123 
Ito-Levy 

decomposition, 137 
process, 143 

Jensen’s inequality, 47, 57 
jump diffusion, 134 

kappa, 259 
Kolmogorov 
inequality, 47, 89 
Kous process, 210 

L-BFGS-B, 70 
Levy, 39 
exponent, 38 
jump diffusion, 134 
measure, 133, 136 
process, 133 

Levy-Khintchine formula, 38 
Lamperti transform, 130 
1 apply, 409 
large numbers 
strong law, 51 
weak law, 51 
lasso, 371 

law of total probability, 16 
least squares method, 66 
levels, 405 
Lewis method, 170 
likelihood, 61 
Lindeberg’s condition, 53 
linear 

growth condition, 129 
list, 399 

listFinCenter, 432 
lm, 398 


load, 394 
local martingale, 131 
localizing sequence, 131 
location, 40 
log-likelihood, 61 
log-normal, 31, 193 
log-returns, 193 
logLik, 69 
Lorentz, 30 
lower, 70 
Is, 396 
LSM, 310 

makeCluster, 411 
makeSOCKcluster, 411 
marked point process, 109 
market price of risk, 343 
Markov 
chain, 92 

continuous time, 99 
operator, 375 
process, 91 
property 

strong, 96, 99 
switching, 147, 183 
marks, 109 
martingale, 84 
exponential, 86 
local, 131 
measure, 261 
MASS, 35 
matrix, 396 
MatrixExp, 101 
maximum likelihood estimator, 64 
MCAsian, 268 
MCdelta, 256 
MCdelta2, 257 
MCPrice, 242, 243 
mean 

reverting, 200 
square error, 58 
measurability, 18 
measurable function, 18 


452 


INDEX 


measure 

equivalent, 17 
Levy, 136 
martingale, 261 
of probability, 14 
random, 136 
risk neutral, 225, 261 
Meixner 

distribution, 33 
process, 208 
memoryless, 27 
merge, 437 
method, 70, 406 
method of moments, 65 
midori, 248 
mixed moment, 22 
mixing property, 101 
MLE, 64 

mle, 68, 169, 202, 406 
modified Bessel function, 117, 331 
MOdist, 377 
moment, 21 

generating function, 24, 86 
mixed, 22 
MSE, 58 
msm, 101, 185 
mts, 423 
multicore, 244 
multicore,413 
multidimensional 

Brownian motion, 145 
geometric Brownian motion, 278 
mvrnorm, 35 

Newton-Raphson, 167 
NIG, 32, 179 
process, 208 
nigFit, 205 
nlm, 167 
non-arbitrage, 6 
nonhomogeneous 
Poisson process, 110 
normal 

gamma, 179 

inverse Gaussian, 32, 179 


normal inverse 
Gaussian 

process, 208 
NULL, 397 
numDeriv, 165 
numeraire, 341 
numeric, 396 
nws, 412 

OANDA, 442 
oandalmport, 444 
oandaSeries, 444 
opefimor, 1 
optim, 168 
option, 2 
American, 7, 285 
basket, 8, 278 
Bermudan, 286 
binary, 252 
call, 5 
digital, 252 
European, 7, 222 
exotic, 7 

path-dependent, 7 
put, 5 
vanilla, 3 

Ornstein-Uhlenbeck, 152 

par, 394 
path, 80 

path-dependent option, 7 
payoff, 5 
function, 221 
pbeta, 31 
pbinom, 25 
pcauchy, 30 
pcCIR, 201 
pchisq, 29 

PerformanceAnalytics, 419 

pexp, 27 

pgamma, 28 

pgh, 33 

plnorm, 31 

pmvnorm, 35 

pnig, 32 


INDEX 


453 


pnorm, 28 
point process, 108 
Poisson, 25 
process, 109 
random measure, 136 
compensated, 143 
random variable, 25 
polyroot, 166 
portfolio 
hedging, 229 
self-financing, 228 
strategy, 228 
portfolio, 419 
POSIXct, 429 
POSIXlt, 429 
ppois, 25 
predictable, 83 
probability, 14 
measure, 14 
space, 14 

extension, 103 
transition, 92, 98 
Process 
Kou, 210 
process 

Brownian, 104 
CGMY, 209 
continuous time, 80 
counting, 108 
diffusion, 128 
discrete time, 79 
gamma, 179 
generalized 

hyperbolic, 207 
tempered stable, 209 
Levy, 133 
Markov, 91 

switching, 147, 183 
Meixner, 208 

normal inverse Gaussian, 208 
point, 108 

marked, 109 
Poisson, 109 
predictable, 83 
telegraph, 114 


tempered stable, 207 
Variance Gamma, 179, 208, 323 
Wiener, 104 
psCIR, 201 
pstable, 41 
pt, 29 
punif, 26 
put option, 5 
put-call party, 239 

qbeta, 31 
qbinom, 25 
qcauchy, 30 
qcCIR, 201 
qchisq, 29 
qexp, 27 
qgamma, 28 
qgh, 33 
qlnorm, 31 
qmle, 198 
qmvnorm, 35 
qnig, 32 
qnorm, 28 
qpois, 25 
qsCIR, 201 
qstable, 41 
qt, 29 
quadratic 
variation, 83 
quantile, 20 
quantmod, 274 
qunif, 26 
quotes, 440 

R-Forge, 395 
Radon-Nikodym, 20, 130, 141 
random 

experiment, 14 
measure, 136 
variable, 18 

Bernoulli, 24 
Beta, 30 
Binomial, 25 
Cauchy, 30 
Chi-square, 28 


454 


INDEX 


random ( Continued) 
continuous, 19 
discrete, 19 
exponential, 26 
gamma, 27 
Gaussian, 28 

generalized hyperbolic, 32 
inverse Gaussian, 32 
Levy, 39 
log-normal, 31 
Meixner, 33 
Poisson, 25 
Student’s t, 29 
uniform, 26 
walk, 85, 92 
rate 

Poisson process, 110 
rates, 101 
rbeta, 31 
rbind, 434 
rbinom, 25 
RBloomberg, 444 
rc.hy, 364 
rcauchy, 30 
rcCIR, 201 
rchisq, 29 
realized, 364 
registerDoSEQ, 246, 415 
registerDoSNOW, 414 
return, 408 
returns, 274 
rev, 439 
rexp, 27 
rgamma, 28, 180 
rgh, 33 
rho, 259 

Richardson’s extrapolation, 164 
risk 

free strategy, 6 
neutral measure, 225, 261 
rlnorm, 31 
Rmetrics, 240, 420 
Rmpi, 412 
rmvnorm, 35, 180 
rnig, 32 


rnorm, 28 

RollGeskeWhaleyOption, 300 

roundoff error, 163 

rpois, 25 

rpvm, 412 

RQuantLib, 420 

rsCIR, 201 

RSiteSearch, 394 

rstable, 41 

rt, 29 

runif, 26 


sample 
space, 14 
sapply, 409 
save, 394 
save.image, 394 
scale, 40 

score function, 62 
sde, 175, 275, 377, 395 
sde. sim, 175, 201 
semimartingale, 131 
setModel, 178 
sf Init, 413 
sfStop, 246, 413, 415 
showMethods, 407 
sigma-algebra, 13 
sign function, 40 
simMarkov, 184 
simMSdiff, 185, 335 
simulate, 178 
skewness, 40 
Slutsky’s theorem, 50 
snow, 411 

snowfall, 244, 412 
solution 

existence and uniqueness, 129 
strong, 128 
weak, 128 
solve, 294 
sort, 439 

square integrability, 23 
stable, 179 
convergence, 103 


INDEX 


455 


in distribution, 148 
law, 38 

stableFit, 205 
standard 
error, 160 
normal, 28 
start, 423 
state space, 80 
stats4, 68, 406 
stochastic 

differential equation, 9, 128 
exponential, 326 
integral, 9, 119, 121 
process, 79 

covariance function, 84 
increments, 84 
quadratic variation, 83 
total variation, 83 
stochastically continuous, 133 
stopCluster, 412 
stopping time, 89 
str, 398 

strike price, 3, 222 
strong 

law of large numbers, 51 
Markov property, 96, 99 
solution, 128 
strptime, 430 
subsampling, 366 
summary, 70 
Sys.getlocale,431 
Sys.setlocale,431 

tail,404 
telegraph 
equation, 115 
process, 114 

geometric, 211 
tempered stable, 179 
process, 207 
theta, 258 
thinning, 170 
time, 423 

timeDate, 420, 428 
timeSeries, 274, 428 


total variation, 83 
trajectory of a process, 79 
transform 
Esscher, 315 
Lamperti, 130 
transient, 97 
transition 
density, 107 
matrix, 92 
probability, 92, 98 
translated 

Brownian motion, 232 
geometric Brownian motion, 232 
trivial a -algebra, 13 
truncation error, 163 
TS 

process, 207 
ts, 423 
tseries, 427 
TTR, 419 
ttrTests, 419 
TurnbullWakemanAsian- 
ApproxOption, 272 
two-sided Brownian motion, 351 

unbiased estimator, 58 
underlying asset, 3 
uniform, 26 

uniqueness of solutions, 129 
uniroot, 166 

vanilla, 3 
VaR, 419 
variance, 21 

variance-covariance matrix, 22 
variance gamma 
process, 208 
vcov, 69 
vector, 397 
vega, 259 

velocity process, 114 
VG, 179, 208, 323 
volatility, 4, 8, 192 
matrix, 278 
smile, 276 


456 


INDEX 


volatility 
implied, 275 
Vsmc, 338 

wave equation, 115 
weak 

convergence, 48 
law of large numbers, 51 
solution, 128 
while, 410 
Wiener, 104 
process, 9 
wild, 168 


window, 424 
writer, 3, 223 

xts,426 

Yahoo, 442 
yahoolmport,443 
yahooSeries, 273, 443 
yuima, 177, 270 
yuima-data, 178 

zoo, 178, 424 
zooreg, 425 


